Artificial intelligence (AI), particularly large language models (LLMs), is rapidly reshaping academic research. From literature synthesis to data analysis and manuscript drafting, AI promises efficiency and new forms of insight. Yet this promise is accompanied by serious ethical challenges. Current research shows that many of these challenges stem not only from technical limitations, but also from institutional unreadiness.
One of the most pressing ethical concerns in AI-assisted research is reliability. LLMs are prone to hallucinations, outputs that are fluent but factually incorrect or fabricated, which can undermine scientific validity, particularly in low-error-tolerance domains such as medicine or public policy (Hua et al., 2024). These hallucinations arise from the probabilistic nature of language models, biased or outdated training data, and decoding strategies that favor plausibility over accuracy. Closely related are concerns about originality and plagiarism. Studies show that AI-generated text often recombines existing ideas rather than producing genuinely novel contributions, sometimes exceeding acceptable similarity thresholds in academic contexts (Hua et al., 2024).
Bias, privacy, and accountability further complicate ethical use. LLMs frequently reproduce social stereotypes embedded in their training data, including gender, religious, and professional biases (Le Jeune et al., 2025). At the same time, AI systems may inadvertently reproduce sensitive personal information, raising privacy risks when researchers input confidential or unpublished data (Hua et al., 2024). Responsibility becomes blurred when decisions or interpretations are partially delegated to opaque systems, challenging traditional notions of scholarly accountability.
Research ethics boards are often ill-prepared to address these issues. Scoping and qualitative studies show that REBs lack technical expertise, standardized evaluation tools, and clear guidelines tailored to AI research (Bouhouita-Guermech et al., 2023). As a result, committees tend to focus narrowly on data privacy and consent, while overlooking deeper issues such as algorithmic bias, model validity, and long-term societal impacts. This regulatory lag creates inconsistent decisions and leaves researchers without clear ethical guidance.
While governance reform is essential, researchers also have immediate, practical means to reduce ethical risks. Prompt engineering, the deliberate design of instructions given to AI systems, has emerged as a powerful mitigation strategy. Empirical evidence shows that vague prompts significantly increase hallucination rates, whereas structured prompts, particularly chain-of-thought (CoT) prompting, can substantially reduce factual errors (Anh-Hoang et al., 2025). Clear instructions that require models to cite uncertainty, explain reasoning steps, or defer when information is missing help align outputs with scholarly standards of caution and transparency.
However, prompt engineering is not a panacea. Studies demonstrate that when a model lacks underlying knowledge, more elaborate prompts may produce more confident-sounding falsehoods (Anh-Hoang et al., 2025). Ethical use, therefore, requires researchers to treat AI outputs as provisional and to verify claims against authoritative sources.
Beyond prompt engineering, several complementary solutions are needed. First, AI literacy must become a core research competency. Understanding how models work, where they fail, and how they encode bias is essential for responsible use (Haber et al., 2025). Second, transparency and disclosure should be standard practice: researchers should clearly report how AI tools were used and for which stages of the research process. Third, institutional solutions, such as domain-specific models, retrieval-augmented generation, and post-deployment monitoring, can mitigate risks associated with general-purpose systems (Hua et al., 2024).
In sum, the ethical use of AI in research cannot rely on technical fixes alone. It requires a combination of better-prepared institutions, informed researchers, and practical tools, such as prompt engineering, embedded within a culture of critical reflection and responsibility.
References
Anh-Hoang, D., Tran, V., & Nguyen, L.-M. (2025). Survey and analysis of hallucinations in large language models: Attribution to prompting strategies or model behavior. Frontiers in Artificial Intelligence, 8, 1622292. https://doi.org/10.3389/frai.2025.16222921
Bouhouita-Guermech, S., Gogognon, P., & Bélisle-Pipon, J.-C. (2023). Specific challenges posed by artificial intelligence in research ethics. Frontiers in Artificial Intelligence, 6, 1149082. https://doi.org/10.3389/frai.2023.1149082
González-Esteban, E., & Calvo, P. (2022). Ethically governing artificial intelligence in the field of scientific research and innovation. Heliyon, 8(2), e08946. https://doi.org/10.1016/j.heliyon.2022.e08946
Haber, E., Jemielniak, D., Kurasiński, A., & Przegalińska, A. (2025). Using AI in academic writing and research: A complete guide to effective and ethical academic AI. Palgrave Macmillan. https://doi.org/10.1007/978-3-031-91705-9
Hua, S. Y., Jin, S. C., & Jiang, S. Y. (2024). The limitations and ethical considerations of ChatGPT. Data Intelligence, 6(1), 201–239. https://doi.org/10.1162/dint_a_00243
Le Jeune, P., Malézieux, B., Xiao, W., & Dora, M. (2025). Phare: A safety probe for large language models. arXiv. https://doi.org/10.48550/arXiv.2505.11365