Supervised Clustering based on a Multi-objective Genetic Algorithm

Authors

รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล, นางวิภา ธนะนันท์

Published

Pertanika Journal of Science and Technology

Abstract

Supervised clustering organizes data instances into clusters on the basis of similarities between the data instances as well as class labels for the data instances. Supervised clustering seeks to meet multiple objectives, such as compactness of clusters, homogeneity of data in clusters with respect to their class labels, and separateness of clusters. With these objectives in mind, a new supervised clustering algorithm based on a multi-objective crowding genetic algorithm, named SC-MOGA, is proposed in this paper. The algorithm searches for the optimal clustering solution that simultaneously achieves the three objectives mentioned above. The SC-MOGA performs very well on a small dataset, but for a large dataset it may not be able to converge to an optimal solution or can take a very long running time to converge to a solution. Hence, a data sampling method based on the Bisecting K-Means algorithm is also introduced, to find representatives for supervised clustering. This method groups the data instances of a dataset into small clusters, each containing data instances with the same class label. Data representatives are then randomly selected from each cluster. The experimental results show that SC-MOGA with the proposed data sampling method is very effective. It outperforms three previously proposed supervised clustering algorithms, namely SRIDHCR, LK-Means and SCEC, in terms of four cluster validity indexes. The experimental results show that the proposed data sampling method not only helps to reduce the number of data instances to be clustered by the SC-MOGA, but also enhances the quality of the data clustering results.

(2562). Factors Influencing ASEAN and International Students of Higher Education Programs in Thailand. Asian Social Science, 2562(9), 67-81.