Optimization Inspired on Herd Immunity Applied to Non-Hierarchical Grouping of Objects
Keywords:Data mining, heuristic, combinatorial optimization, bio-inspired computing
AbstractCharacterized as one of the most important operations related to data analysis, one non-hierarchical grouping consists of, even without having any information about the elements to be classified, establish upon a finite collection of objects, the partitioning of the items that constitute it into subsets or groups without intersecting, so that the elements that are part of a certain group are more similar to each other than the items that belong to distinct group. In this context, this study proposes the application of a meta-heuristic inspired by herd immunity to the determination of the non-hierarchical grouping of objects, and compares the results obtained by this method with the answers provided by four other grouping strategies, described in the literature. In particular, the resulting arrangements of the classification of 33 benchmark collections, performed by the suggested algorithm, by the metaheuristic inspired by the particle swarm, by the genetic algorithm, by the K-means algorithm and by the meta-heuristic inspired by the thermal annealing process, were compared under the perspective of 10 different evaluation measures, indicating that the partitions established by the meta-heuristic inspired by the herd immunity may, in certain respects, be more favorable than the classifications obtained by the other clustering methods.
TAN, P. N.; STEINBACH, M.; KUMAR, V. Introduction to Data Mining. Boston: Pearson Education, Inc., 2006.
XU, R.; WUNSCH, D. C. Clustering. Piscataway, New Jersey: IEEE Press, 2009.
JAIN, A. K.; MURTY, M. N.; FLYNN, P. Data clustering: A review. ACM Computing Surveys, v. 31, n. 3, p. 264–323, 1999.
JAIN, A. K.; DUBES, R. C. Algorithms for Clustering Data. New Jersey: Prentice Hall, 1998.
EVERITT, B. S. et al. Cluster Analysis. London: John Wiley & Sons, Ltd, 2011.
SINGH, S.; SRIVASTAVA, S. Review of clustering techniques in control system. Procedia Computer Science, v. 173, p. 272–280, 2020. International Conference on Smart Sustainable Intelligent Computing and Applications under ICITETM2020.
WIERZCHON, S. T.; KLOPOTEK, M. A. Algorithms of Cluster Analysis. Warsaw, Poland: Institute of Computer Science, Polish Academy of Sciences, 2015.
BLUM, C.; ROLI, A. Metaheuristics in combinatorial optimization. ACM Computing Surveys, v. 35, n. 3, p. 268–308, 2003.
JAIN, A. K. Data clustering: 50 years beyond k–means. Pattern Recognition Letters, v. 31, n. 8, p. 651–666, 2010.
ALIA, O.; MANDAVA, R. The variants of the harmony search algorithm: an overview. Artificial Intelligence Review, v. 36, n. 1, p. 49–68, 2011.
AL-BETAR, M. A. et al. Coronavirus herd immunity optimizer (chio). Neural Computing and Applications, v. 33, p. 5011–5042, 2020. Dispon ́ıvel em: 〈https://doi.org/10.1007/s00521-020-05296-6〉.
TURNER, R. Essentials of Microbiology. United Kingdom: Ed-Tech Press, 2020.
RASMUSSEN, A. L. Vaccination is the only acceptable path to herd immunity. Med, v. 1, n. 1, p. 21–23, 2020.
FINE, P.; EAMES, K.; HEYMANN, D. L. Herd immunity: A rough guide. Clinical Infectious Diseases, v. 52, n. 7, p. 911–916, 2011.
FORSATI, R. et al. Efficient stochastic algorithms for document clustering. Information Sciences, v. 220, p. 269–291, 2013.
ZHU, W. et al. Clustering algorithm based on fuzzy c-means and artificial fish swarm. Procedia Engineering, v. 29, p. 3307–3311, 2012.
XIE, H. et al. Improving k-means clustering with enhanced firefly algorithms. Applied Soft Computing, v. 84, p. 105763, 2019.
CUI, X.; POTOK, T. E.; PALATHINGAL, P. Document clustering using particle swarm optimization. In: Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005. Pasadena, California: IEEE, 2005. p. 185–191.
SELIM, S. Z.; ALSULTAN, K. A simulated annealing algorithm for the clustering problem. Pattern Recognition, v. 24, n. 10, p. 1003–1008, 1991.
MAULIK, U.; BANDYOPADHYAY, S. Genetic algorithm-based clustering technique. Pattern Recognition, v. 33, p. 1455–1465, 2000.
MACQUEEN, J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, California: University of California Press, 1967. p. 281–297.
FR ̈ANTI, P.; SIERANOJA, S. K-means properties on six clustering benchmark datasets. 2018. 4743–4759 p. Dispon ́ıvel em: 〈http://cs.uef.fi/sipu/datasets/〉.
DUA, D.; GRAFF, C. UCI Machine Learning Repository. 2017. Dispon ́ıvel em: 〈http://archive.ics.uci.edu/ml〉.
MANNING, C. D.; RAGHAVAN, P.; SCH ̈uTZE, H. Introduction to Information Retrieval. USA: Cambridge University Press, 2008.
ROUSSEEUW, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, v. 20, p. 53–65, 1987.
RAND, W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, Taylor & Francis, v. 66, n. 336, p. 846–850, 1971.
FOWLKES, E. B.; MALLOWS, C. L. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, Taylor & Francis, v. 78, n. 383, p. 553–569, 1983.
DAVIES, D. L.; BOULDIN, D. W. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, n. 2, p. 224–227, 1979.
DUNN, J. C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics, Taylor & Francis, v. 3, n. 3, p. 32–57, 1973.
DERRAC, J. et al. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, v. 1, n. 1, p. 3–18, 2011.