ORIGINAL PAPER
ChatGPT 3.5 Passes the Minimum Intelligence Signal Test (MIST). Should we care?
 
More details
Hide details
1
Adam Mickiewicz University
 
 
Submission date: 2025-03-17
 
 
Final revision date: 2025-11-15
 
 
Acceptance date: 2025-11-17
 
 
Publication date: 2025-12-29
 
 
Corresponding author
Paweł Łupkowski   

Adam Mickiewicz University
 
 
JoMS 2025;64(4):552-574
 
KEYWORDS
TOPICS
ABSTRACT
Objectives:
This study examines whether ChatGPT 3.5 can successfully pass the Minimum Intelligence Signal Test (MIST). The study aims both to evaluate ChatGPT’s performance on this benchmark and to consider the implications of such performance for contemporary debates on artificial intelligence.

Material and methods:
For the experiment, ChatGPT 3.5 was used. 4,000 MIST items were retrieved randomly from the publicly available part of the Mindpixel Database. For all tests, the simple prompt was used: “Please answer ‘yes’ or ‘no’ to the following questions”. Responses were collected exactly in the form provided by the chat. Outputs were compared to the database’s human responses, with agreement measured. Responses violating the yes/no format were examined qualitatively to assess their causes and the reliability of the dataset.

Results:
In six attempts with MIST, ChatGPT’s correctness score reached over 94%, with Cohen’s Kappa values indicating almost perfect agreement with human-generated responses. Repeated trials produced high internal consistency. 2% of responses did not conform to the yes/no format, typically due to ambiguous, subjective, or ill-formed items in the database.

Conclusions:
ChatGPT 3.5 clearly passes MIST. Interpretation of this result in the light of Searle’s classical distinction leads to the conclusion that chat exemplifies weak AI – a powerful tool that simulates intelligent behavior, without any pretense to intrinsic understanding. The study also further reinforces the weakening of French’s claim that disembodied artificial agents cannot answer common knowledge related (subcognitive) questions. Results also indicate the need for pre-validation of available MIST items and encourage further testing of theoretical proposals from the Turing Test debate domain, such as the Inverted Turing Test proposed by Watt.
REFERENCES (45)
1.
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q. V., Xu, Y., & Fung, P. (2023). A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 675–718). Association for Computational Linguistics.
 
2.
Block N. (1995), “The mind as the software of the brain,” [in:] An Invitation to Cognitive Science - Thinking, E. Smith, D. Osherson, (eds), The MIT Press, London: 377-425.
 
3.
Carletta J. (1996), “Assessing Agreement on Classification Tasks: The Kappa Statistic,” Computational Linguistics 22(2): 249-254.
 
4.
Copeland, B. J. (Ed.). (2004). The Essential Turing. Clarendon Press.
 
5.
Danziger, S. (2022). Intelligence as a social concept: a socio-technological interpretation of the Turing Test. Philosophy & Technology, 35, 68 (2022). https://doi.org/10.1007/s13347....
 
6.
Davis, E. (2023). Benchmarks for automated commonsense reasoning: A survey. ACM Computing Surveys, 56(4), 1-41.
 
7.
Dennett, D. C. (2004). Can machines think?. In C. Teuscher (ed.), Alan Turing: Life and legacy of a great thinker (pp. 295-316). Berlin, Heidelberg: Springer Berlin Heidelberg.
 
8.
Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30, 681–694. https://doi.org/10.1007/s11023....
 
9.
French R. (1990), "Subcognition and the Limits of the Turing Test,” Mind 99(393): 53-65.
 
10.
Garner R. (2009), “The Turing hub as a standard for Turing Test interfaces,” [in:] Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, R. Epstein, G. Roberts, G. Beber (eds), Springer Publishing Company: 319-324.
 
11.
Gonçalves, B. (2023a). Can machines think? The controversy that led to the Turing Test. AI & SOCIETY, 38(6), 2499-2509.
 
12.
Gonçalves, B. (2023b) The Turing Test is a Thought Experiment. Minds & Machines 33, 1–31 . https://doi.org/10.1007/s11023....
 
13.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
 
14.
Hingston, P. (2009). A Turing Test for computer game bots. IEEE Transactions on Computational Intelligence and AI in Games, 1(3), 169–186.
 
15.
Kocoń, J., Cichecki, I., Kaszyca, O., Kochanek, M., Szydło, D., Baran, J., Bielaniewicz, J., Gruza, M., Janz, A., Kanclerz, K., Kocoń, A., Koptyra, B., Mieleszczenko-Kowszewicz, W., Miłkowski, P., Oleksy, M., Piasecki, M., Radliński, Ł., Wojtasik, K., Woźniak, S., & Kazienko, P. (2023). ChatGPT: Jack of all trades, master of none. Information Fusion, 99, 101861. https://doi.org/10.1016/j.inff....
 
16.
Kühl, N., Goutier, M., Baier, L., Wolff, C., & Martin, D. (2022). Human vs. supervised machine learning: Who learns patterns faster?. Cognitive Systems Research, 76, 78-92.
 
17.
Loebner H. (2009), “How to hold a Turing Test contest,” [in:] Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, R. Epstein, G. Roberts, G. Beber (eds), Springer Publishing Company: 173-180.
 
18.
Łupkowski P. (2011), “A Formal Approach to Exploring the Interrogator’s Perspective in the Turing Test,” Logic and Logical Philosophy 20(1-2): 139-158.
 
19.
Łupkowski P., Wiśniewski A. (2011), “Turing interrogative games,” Minds and Machines 21(3): 435-448.
 
20.
Łupkowski P., Rybacka A. (2016), “Non-cooperative Strategies of Players in the Loebner Contest,” Organon F 23(3): 324-365.
 
21.
Łupkowski, P., Krajewska, V. (2018). Immersion level and bot player identification in a multiplayer online game: The World of Warships case study. Homo Ludens, 1 (11), 155-171.
 
22.
Łupkowski, P., Jurowska, P. (2019). Minimum Intelligent Signal Test (MIST) as an Alternative to the Turing Test. Diametros, 16(59), 35-47.
 
23.
Łupkowski, P. (2019). Turing’s 1948 ‘Paper Chess Machine’Test as a Prototype of the Turing Test. Ruch Filozoficzny, 75(2), 117-128.
 
24.
Martínez-Plumed, F., Barredo, P., Heigeartaigh, S. O., & Hernandez-Orallo, J. (2021). Research community dynamics behind popular AI benchmarks. Nature Machine Intelligence, 3(7), 581-589.
 
25.
Mauldin M.L. (1994), “Chatterbots, Tiny Muds, and the Turing Test: entering the Loebner Prize competition,” [in:] Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-04), Menlo Park (CA): 16-21.
 
26.
Mauldin M. L. (2009), “Going undercover: Passing as human; artificial interest: A step on the road to AI,” [in:] Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, R. Epstein, G. Roberts, G. Beber (eds), Springer Publishing Company: 413-430.
 
27.
McKinstry C. (1997), “Minimum Intelligence Signal Test: an Objective Turing Test,” Canadian Artificial Intelligence (41): 17-18.
 
28.
McKinstry C. (2009), “Mind as Space: Toward the Automatic Discovery of a Universal Human Semantic-affective Hyperspace - A Possible Subcognitive Foundation of a Computer Program Able to Pass the Turing Test,” [in:] Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, R. Epstein, G. Roberts, G. Beber (eds), Springer Publishing Company: 283-300.
 
29.
Fiona Fui-Hoon Nah, Ruilin Zheng, Jingyuan Cai, Keng Siau & Langtao Chen (2023) Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration, Journal of Information Technology Case and Application Research, 25:3, 277-304, DOI: 10.1080/15228053.2023.2233814.
 
30.
Newman A.H., Turing A.M., Jefferson G., Braithwaite R.B. (1952), “Can automatic calculating machines be said to think?”, [in:] The Turing Digital Archive (www.turingarchive.org), Contents of AMT/B/6.
 
31.
Purtill, R. L.. (1971) Beating the imitation game. Mind, LXXX(318):290–294, 1971.
 
32.
R Core Team (2013), “R: A language and environment for statistical computing. R Foundation for Statistical Computing,” URL=http://www. R-project.org/.
 
33.
Radanliev, P. (2024) Artificial intelligence: reflecting on the past and looking towards the next paradigm shift, Journal of Experimental & Theoretical Artificial Intelligence, DOI: 10.1080/0952813X.2024.2323042.
 
34.
Echavarría, R. (2025). ChatGPT-4 in the Turing Test. Minds and Machines, 35(1), 8.
 
35.
Searle, John. R. (1980) Minds, brains, and programs. Behavioral and Brain Sciences 3 (3): 417-457.
 
36.
Speer, R., Chin, J., & Havasi, C. (2017, February). Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
 
37.
Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M., Abid, A., Fisch, A., Brown, A. R., et al. (2023). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research. Retrieved from https://openreview.net/forum?i....
 
38.
Sampson, G. (1973). In defense of Turing. Mind, 82(328), 529–594.
 
39.
Talmor, A., Herzig, J., Lourie, N., & Berant, J. (2019, June). CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4149-4158).
 
40.
Turing Alan M. (1948), “Intelligent Machinery”. The Turing digital archive (http://www.turingarchive.org), contents of AMT/C/11.
 
41.
Turing A. M. (1950), “Computing machinery and intelligence,” Mind LIX(236): 443-455.
 
42.
Turney P.D. (2001a), “Answering subcognitive Turing Test questions: A reply to French,” Journal of Experimental and Theoretical Artificial Intelligence 13(4): 409-419.
 
43.
Turney P.D. (2001b), “Mining the web for synonyms: PMI-IR versus LSA on TOEFL,” [in:] Proceedings of European Conference on Machine Learning, Springer, Berlin, Heidelberg: 491-502.
 
44.
Viera A.J., Garrett J.M. (2005), “Understanding Interobserver Agreement: The Kappa Statistic,” Family Medicine 37(5): 360-363.
 
45.
Watt, S. (1996). Naive psychology and the inverted Turing Test. Psycoloquy, 7(14), 463-518.
 
eISSN:2391-789X
ISSN:1734-2031
Journals System - logo
Scroll to top