Efficient Dimension Reduction Technique for Basic K-Means Clustering Algorithm

Authors

  • Dauda Usman
  • Ismail Mohamad

DOI:

https://doi.org/10.11113/matematika.v29.n.598

Abstract

K-means clustering is being widely studied problem in a variety of application domains. The computational complexity of the basic k-means is very high, the number of distance calculations also increases with the increase of the dimensionality of the data. Several algorithms have been proposed to improve the performance of the basic k-means. Here we investigate the behavior of the basic k-means clustering algorithm and two alternatives to it, we have analyzed the performances of three different standardization methods. Equivalently, we prove that z-score and principal components are the best preprocessing methods that will simplify the analysis and visualize the multidimensional dataset. The analyzed result revealed that the z-score outperform min-max and decimal scaling also principal component analysis picks up the dimensions with the largest variances. Our results also provide effective ways to solve the k-means clustering problems. Keywords: Decimal Scaling; K-Means Clustering; Min-Max; Principal Component Analysis; Standardization; z-score. 2010 Mathematics Subject Classification: 62H30; 68T10

Downloads

Published

2013-06-01

Issue

Section

Mathematics