A new AI-based method for clustering survey responses
More details
Hide details
Lublin University of Technology
Submission date: 2023-07-25
Acceptance date: 2023-12-01
Publication date: 2023-12-18
Corresponding author
Jan Franciszek Laskowski   

Lublin University of Technology
JoMS 2023;54(Numer specjalny 5):355-377
Many research projects, particularly in social science research, depend on clustering survey responses. When analyzing survey data, traditional clustering algorithms have several drawbacks. The ability to analyze survey data more effectively has been made possible by recent developments in artificial intelligence (AI) and machine learning (ML). The aim of this article is to present a new, AI-based method of clustering survey responses using a Variational Autoencoder (VAE).

Material and methods:
To determine the effectiveness of grouping, the new VAE clustering method was compared with K-means, PCA and k-means, and Agglomerative Hierarchical Clustering methods by applying the Silhouette score, the Calinski-Harabasz score, and the Davies-Bouldin score metrics.

In the case of the Silhouette Score, the developed VAE method obtained a 69% higher average effectiveness of clustering survey responses than the others. For the Calinski-Harabasz Score and the Davies-Bouldin Score, respectively, the VAE method outperformed the other methods by 164% and 111%, respectively.

The VAE method allowed for the most effective grouping of responses given by respondents. It has made it possible to capture complex relationships and patterns in the data. In addition, the method is suitable for analyzing different types of survey data (continuous, categorical, and mixed data) and is resistant to noise and missing data.

Arthur, D., Vassilvitskii, S. (2007). K-means. the advantages of careful seeding. Symposium on Discrete Algorithms. Accessed 20.04.2023 at https://forge.agroparistech.fr....
Arturo, A., Scuola, V., Santanna, S., Binaghi, E., Vergani, A. A. (2018). A soft davies-bouldin separation measure. 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). https://doi.org/10.1109/FUZZ-I....
Asadoorian, M., Kantarelis, D. (2005). Essentials of inferential statistics. Accessed 23.04.2023 at https://www.google.com/books?h....
Bock, H. (2007). Clustering Methods: A History of k-Means Algorithms. Selected Contributions in Data Analysis. Accessed 12.05.2023 at https://link.springer.com/cont....
Caliński, T. (1974). A dendrite method for cluster analysis. Taylor & Francis, 1–27. https://doi.org/10.1080/036109....
Campello, R. J. G. B., Moulavi, D., Sander, J. (2013). Density-based clustering based on hierarchical density estimates, 7819 LNAI(PART 2), 160–172. Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-....
Davies, D. L., Bouldin, D. W. (1979). A Cluster Separation Measure, PAMI-1(2), 224–227. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.....
Day, W. H. E., Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods., 1(1), 7–24. Journal of Classification. https://doi.org/10.1007/BF0189....
Doersch, C. (2016). Tutorial on Variational Autoencoders. Accessed 20.04.2023 at https://arxiv.org/abs/1606.059....
Fowler, F. J. (2013). Survey research methods. Taylor & Francis.
Fraley, C., Raftery, A. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal. Accessed 19.04.2023 at https://academic.oup.com/comjn....
Holcomb, Z. (2016). Fundamentals of descriptive statistics. Accessed 22.04.2023 at https://www.google.com/books?h....
Jollife, I. T., Cadima, J. (2016). Principal component analysis: a review and recent developments. 374(2065). Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. https://doi.org/10.1098/RSTA.2....
Kingma, D. P., Welling, M. (2019). An Introduction to Variational Autoencoders, 12(4), 307–392. Foundations and Trends® in Machine Learning. https://doi.org/10.1561/220000....
Kleinbaum, D., Kupper, L., Nizam, A., Rosenberg, E. (2013). Applied regression analysis and other multivariable methods. Cengage Learning.
Kriegel, H. P., Kröger, P., Sander, J., Zimek, A. (2011). Density-based clustering, 1(3), 231–240. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. https://doi.org/10.1002/WIDM.3....
Laskowska, A., Laskowski, J. F. (2022). Silver Generation at Work – Implications for Sustainable Human Capital Management in the Industry 5.0 Era, 15(1), 194. Sustainability. https://doi.org/10.3390/SU1501....
Likas, A., Vlassis, N., Verbeek, J. (2003). The global k-means clustering algorithm. Pattern Recognition. Accessed 19.04.2023 at https://www.sciencedirect.com/....
Lima, S., Aplicada, M. C. (2020). A genetic algorithm using Calinski-Harabasz index for automatic clustering problem, 12(3), 97–106. Revista Brasileira de Computação. https://doi.org/10.5335/rbca.v....
Manning, C. (2009). An introduction to information retrieval. Accessed 11.04.2023 at https://ds.amu.edu.et/xmlui/bi....
Murtagh, F., Contreras, P. (2012). Algorithms for hierarchical clustering: An overview, 2(1), 86–97. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. https://doi.org/10.1002/WIDM.5....
Ng, A., Jordan, M., Weiss, Y. (2001). On Spectral Clustering: Analysis and an algorithm, 14. Advances in Neural Information Processing Systems.
Osgood, C. E. (1964). Semantic Differential Technique in the Comparative Study of Cultures, 66(3), 171-200. American Anthropologist.
Petrovic, S. (2006). A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters. Proceedings of the 11th Nordic Workshop of Secure. Accessed 15.04.2023 at https://citeseerx.ist.psu.edu/... 12e97cfdaefbb2fefc253b.
Punj, G., Stewart, D. W. (1983). Cluster Analysis in Marketing Research: Review and Suggestions for Application, 20(2), 134–148. Journal of Marketing Research. https://doi.org/10.1177/002224....
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, 20(C), 53–65. Journal of Computational and Applied Mathematics. https://doi.org/10.1016/0377-0....
Schwartz, S. H., Cieciuch, J., Vecchione, M., Davidov, E., Fischer, R., Beierlein, C., Ramos, A., Verkasalo, M., Lönnqvist, J. E., Demirutku, K., Dirilen-Gumus, O., Konty, M. (2012). Refining the theory of basic individual values, 103(4), 663-688. Journal of Personality and Social Psychology. https://doi.org/10.1037/A00293....
Shahapure, K., Nicholas, C. (2020). Cluster quality analysis using silhouette score. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). Accessed 11.04.2023 at https://ieeexplore.ieee.org/ab....
Shutaywi, M., Kachouie, N. N., Scarfone, M. (2021). Silhouette analysis for performance evaluation in machine learning with applications to clustering, 6(23), 759. Entropy, https://doi.org/10.3390/e23060....
Themistocleous, C., Pagiaslis, A., Smith, A., Wagner, C. (2019). A comparison of scale attributes between interval-valued and semantic differential scales, 61(4), 394-407. International Journal of Market Research. https://doi.org/10.1177/147078....
Tucker, L. (1951). A method for synthesis of factor analysis studies. ETS Program Report. Accessed 21.04.2023 at https://apps.dtic.mil/sti/pdfs....
Wang, K. J., Zhang, J. Y., Li, D., Zhang, X. N., Guo, T. (2007). Adaptive affinity propagation clustering. 33(12), 1242–1246. Acta Automatica Sinica. https://doi.org/10.1360/aas-00....
Ward, J. H. (1963). Hierarchical Grouping to Optimize an Objective Function, 58(301), 236–244. Journal of the American Statistical Association. https://doi.org/10.1080/016214....
Journals System - logo
Scroll to top