Takale, D., Mahalle, P. & Sule, B. Advancements and applications of generative artificial intelligence. Journal of Information Technology and Sciences 10, 20–27 (2024).
Ramdurai, B. & Adhithya, P. The impact, advancements and applications of generative AI. International Journal of Computer Science and Engineering 10, 1–8 (2023).
Wang, J. et al. Evaluation and analysis of hallucination in large vision-language models (2023). arXiv:2308.15126.
Nwanna, M. et al. AI-driven personalisation: Transforming user experience across mobile applications. Journal of Artificial Intelligence, Machine Learning and Data Science 3, 1930–1937 (2025).
Behare, N., Bhagat, S. & Sarangdhar, P. Revolutionizing Customer Experience With AI-Powered Personalization. In Strategic Brand Management in the Age of AI and Disruption, 439–462 (IGI Global Scientific Publishing, 2025).
Ji, Z. et al. Survey of hallucination in natural language generation. ACM Computing Surveys 55, 1–38 (2023).
Zhang, Y. et al. Siren’s song in the AI ocean: a survey on hallucination in large language models (2023). arXiv:2309.01219.
Huang, L. et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems 43, 1–55 (2025).
Rawte, V. et al. The troubling emergence of hallucination in large language models-an extensive definition, quantification, and prescriptive remediations. In Findings of the Association for Computational Linguistics: EMNLP 2023 (Association for Computational Linguistics, 2023).
Bang, Y. et al. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity (2023). arXiv:2302.04023.
Li, J., Cheng, X., Zhao, W., Nie, J. & Wen, J. Halueval: A large-scale hallucination evaluation benchmark for large language models (2023). arXiv:2305.11747.
Zhu, Z., Yang, Y. & Sun, Z. Halueval-wild: Evaluating hallucinations of language models in the wild (2024). arXiv:2403.04307.
Shao, A. Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication (2025). arXiv:2504.13777.
Massenon, R. et al. Mobile app review analysis for crowdsourcing of software requirements: a mapping study of automated and semi-automated tools. PeerJ Computer Science 10, e2401 (2024).
Google Scholar
Gambo, I. et al. Enhancing user trust and interpretability in ai-driven feature request detection for mobile app reviews: an explainable approach. IEEE Access (2024).
Dąbrowski, J., Letier, E., Perini, A. & Susi, A. Analysing app reviews for software engineering: a systematic literature review. Empirical Software Engineering 27, 43 (2022).
Genc-Nayebi, N. & Abran, A. A systematic literature review: Opinion mining studies from mobile app store user reviews. Journal of Systems and Software 125, 207–219 (2017).
Palomba, F. et al. User reviews matter! tracking crowdsourced reviews to support evolution of successful apps. In 2015 IEEE international conference on software maintenance and evolution (ICSME), 291–300 (IEEE, 2015).
Fan, A. et al. Large language models for software engineering: Survey and open problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), 31–53 (IEEE, 2023).
Görmez, M., Yılmaz, M. & Clarke, P. Large Language Models for Software Engineering: A Systematic Mapping Study. In European Conference on Software Process Improvement, 64–79 (Springer Nature Switzerland, Cham, 2024).
Khan, W., Daud, A., Khan, K., Muhammad, S. & Haq, R. Exploring the frontiers of deep learning and natural language processing: A comprehensive overview of key challenges and emerging trends. Natural Language Processing Journal 4, 100026 (2023).
Desai, B., Patil, K., Patil, A. & Mehta, I. Large Language Models: A Comprehensive Exploration of Modern AI’s Potential and Pitfalls. Journal of Innovative Technologies 6 (2023).
Koenecke, A., Choi, A., Mei, K., Schellmann, H. & Sloane, M. Careless whisper: Speech-to-text hallucination harms. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 1672–1681 (ACM, 2024).
Moffatt v. Air Canada. McCarthy Tétrault TechLex Blog (2024). Available at: Last accessed 2025/05/05.
Maynez, J., Narayan, S., Bohnet, B. & McDonald, R. On faithfulness and factuality in abstractive summarization (2020). arXiv:2005.00661.
Leiser, F. et al. From ChatGPT to FactGPT: A participatory design study to mitigate the effects of large language model hallucinations on users. In Proceedings of Mensch und Computer 2023, 81–90 (ACM, 2023).
Leiser, F. et al. Hill: A hallucination identifier for large language models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 1–13 (ACM, 2024).
Xu, Z., Jain, S. & Kankanhalli, M. Hallucination is inevitable: An innate limitation of large language models (2024). arXiv:2401.11817.
Tonmoy, S. et al. A comprehensive survey of hallucination mitigation techniques in large language models (2024). arXiv:2401.01313.
Martino, A., Iannelli, M. & Truong, C. Knowledge injection to counter large language model (LLM) hallucination. In European Semantic Web Conference, 182–185 (Springer Nature Switzerland, Cham, 2023).
Agrawal, A., Suzgun, M., Mackey, L. & Kalai, A. Do Language Models Know When They’re Hallucinating References? (2023). arXiv:2305.18248.
Jiang, Z., Araki, J., Ding, H. & Neubig, G. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics 9, 962–977 (2021).
Xiong, M. et al. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms (2023). arXiv:2306.13063.
Khan, J., Qayyum, S. & Dar, H. Large Language Model for Requirements Engineering: A Systematic Literature Review. Research Square, (2025).
Min, B. et al. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys 56, 1–40 (2023).
Hariri, W. Unlocking the potential of ChatGPT: A comprehensive exploration of its applications, advantages, limitations, and future directions in natural language processing (2023). arXiv:2304.02017.
Vinothkumar, J. & Karunamurthy, A. Recent advancements in artificial intelligence technology: trends and implications. Quing: International Journal of Multidisciplinary Scientific Research and Development 2, 1–11 (2023).
Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024).
Google Scholar
Dhuliawala, S. et al. Chain-of-verification reduces hallucination in large language models (2023). arXiv:2309.11495.
Béchard, P. & Ayala, O. M. Reducing hallucination in structured outputs via retrieval-augmented generation (2024). arXiv:2404.08189.
He, B. et al. Retrieving, rethinking and revising: The chain-of-verification can improve retrieval augmented generation (2024). arXiv:2410.05801.
Liu, F. et al. Exploring and evaluating hallucinations in llm-powered code generation (2024). arXiv:2404.00971.
Lee, Y. et al. Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges (2025). arXiv:2504.20799.
Lin, S., Hilton, J. & Evans, O. Truthfulqa: Measuring how models mimic human falsehoods (2021). arXiv:2109.07958.
Zheng, S., Huang, J. & Chang, K. Why Does ChatGPT Fall Short in Providing Truthful Answers? (2023). arXiv:2304.10513.
Guerreiro, N. et al. Mitigating Hallucinations in Neural Machine Translation through Fuzzy-match Repair. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 123–132 (EAMT, 2023).
Chen, N., Lin, J., Hoi, S., Xiao, X. & Zhang, B. AR-miner: mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th international conference on software engineering, 767–778 (ACM, 2014).
Wu, H., Deng, W., Niu, X. & Nie, C. Identifying key features from app user reviews. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 922–932 (IEEE, 2021).
Guzman, E. & Maalej, W. How do users like this feature? a fine grained sentiment analysis of app reviews. In 2014 IEEE 22nd international requirements engineering conference (RE), 153–162 (IEEE, 2014).
Ballas, V., Michalakis, K., Alexandridis, G. & Caridakis, G. Automating mobile app review user feedback with aspect-based sentiment analysis. In International Conference on Human-Computer Interaction, 179–193 (Springer Nature Switzerland, Cham, 2024).
Shah, F., Sabir, A. & Sharma, R. A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study (2024). arXiv:2409.07162.
Ossai, C. & Wickramasinghe, N. Automatic user sentiments extraction from diabetes mobile apps–An evaluation of reviews with machine learning. Informatics for Health and Social Care 48, 211–230 (2023).
Google Scholar
Gambo, I. et al. Extracting Features from App Store Reviews to Improve Requirements Analysis: Natural Language Processing and Machine Learning Approach. International Journal of Computing 17, 1–19 (2025).
Gambo, I., Massenon, R., Ogundokun, R. O., Agarwal, S. & Pak, W. Identifying and resolving conflict in mobile application features through contradictory feedback analysis. Heliyon 10 (2024).
Dam, S., Hong, C., Qiao, Y. & Zhang, C. A complete survey on llm-based ai chatbots (2024). arXiv:2406.16937.
link
