Missing data imputation

In practice, missing data are very common in real data processing. The reasons may comprise data entry errors, information hiding, or fraud. In this article, we will discuss in which cases incorrect handling of missing data by simple methods will lead to errors in models and decision-making.

Scalable CLOPE algorithm for clustering categorical data

Splitting of categorical and transactional data sets into groups with similar attributes into large databases is the most important task of data mining. In most cases, traditional clustering algorithms are not effective when processing large databases. This article describes the scalable heuristic algorithm — CLOPE, which allows for high quality an...