univerge site banner
Original Article | Open Access | Aust. J. Eng. Innov. Technol., 2025; 7(6), 268-275 | doi: 10.34104/ajpab.025.02680275

The Future of Assessment: A Theoretical Model for AI-Enhanced Authentic Evaluation in Education

Sathy Akter* Mail Img Orcid Img ,
Md Hasibur Rahman Mail Img Orcid Img

Abstract

The adoption of Generative Artificial Intelligence (GenAI) has led to a major crisis of credibility in traditional modes of assessment in UK education and at the same time makes the traditional and recall-based forms of assessment redundant by technological standards. The present proposal has created and qualitatively approached to develop the AI-Enhanced Authentic Evaluation (AI-EAE) Model which is the combination of a theoretical framework that requires a strategic change to complex and the real-world assessment procedures. The AI-EAE Model brings together Authentic Assessment (AA), high ethical lens of control (transparency, bias counteract), and the necessary pedagogical shift - concerned about metacognition and critical AI literacy in a system. The research responses to the great need for a unified model of practice involving sensitive balances of assessment validity and ethical responsibility and institutional capacity to avoid supporting perpetuating institutional inequities. A qualitative, exploratory approach utilizes a two-phase (pre-program/post-program) semi-structured interview guide to members of the key stakeholders in the UK education sector (faculty, administrators, and students). A thematic analysis method of Braun and Clarke is used in the analysis of the data. The study will focus critically on awareness of in-depth areas of ethical friction such as algorithmic bias and the possibility of curriculum redesign to promote Collaborative AI Metacognition. The results will offer an ethnically based institutional policy outline where assessment would be ethical and develop the necessary 21st-century competencies.

Introduction

Pervasive adoption of Generative Artificial Intelligence (GenAI) has elicited a crisis of construct validity in established and standardised roughly assessment activities in the UK higher education sector. Activities that were in the past utilised in assessing learning (essays, multiple choice questions (MCQs) to name a few) are now able to be performed with machine intelligence effectively thus in principle revealing the technological irrelevance of such practises. The issue is two-fold: on the one hand, there is a significant decline in the validity of these testing tools to quantify the desired human cognitive tools (critical thinking, judgement); and, on the other, it is also based on the fact that the reliance on reactive measures, i.e. AI detection software, is invalidated due to its considerable non-reliability and the possibility of bias in relation to non-native speakers of the English language. This will be accompanied by the policy of paradigm shift in the assessment philosophy (Mallillin and Caranguian, 2023).

The fundamental intention of the research is to establish and qualitatively proving an intensive theoretical framework the AI-Enhanced Authentic Evalu-ation an AI EAE model, which scientifically considers the advantages of high-fidelity authentic assessment as well as stringent ethical AI control combined with obligatory pedagogical changes in the sphere of metacognition and AI literacy. The research expects to shift institutional response to proactive rather than reactive policing that is based on ethical assessment.   

Research Question

  1. What are the perceived pedagogical benefits and institutional barriers of implementing AI-enhanced, iterative authentic assessment tasks?
  2. How do stakeholders evaluate the feasibility and integrity of ethical guardrails designed to mitigate bias and ensure transparency in AI-driven evaluation?
  3. What institutional support and training are necessary to enable faculty to design assessments that effectively cultivate student metacognition and AI literacy?

Significance of the Study

The study is an extremely major one because it provides a theoretically informed and systematic blueprint of assessment reform within the UK higher education that will help the institutions combat the present validity crisis. Offering deep qualitative information on the experience of the stakeholders, the study gives practical information on what the profession should do and what policy frameworks must be in place to happily utilise AI as an augmentation tool and not a substitute for human judgments. The strategic change directs education towards the increasing industry expectations of high-level (21st century) high-order skills that are above the measure of the traditional tests.   

Assumptions of the Study

The research presupposes that the existing capacity of GenAI has already damaged the authenticity of conventional evaluations (essays, MCQs) irreparably and that universities in the UK are devoted to the implementation of technologies that would guarantee the high standards of academic integrity and fair results. It also presupposes that data of qualitative interviews of a chosen sample of academic stakeholders will contribute a strong basis to theoretical validation of the suggested model.   

Limitations and Delimitations

Limitations: The first weakness is the fact that qualitative results are not generalisable, and the authors used self-reported images of the stakeholders of one particular institution, which restricts the ability to statistically extrapolate the findings to the wider UK context. Limitations: The study is firmly limited to a qualitative, exploratory research of the theoretical scope and perceived ethical issue of the AI-EAE Model. It lacks an empirical, quantitative estimate of the effect of the model upon actual student performance or measures of assessment, which would involve a large-scale timespan follow-up study.    

Key Concepts


Review of Literature

Crisis of Validity: Evaluation of Obsolescence in the AI Era

The original literature attests that the challenge of GenAI is the threat of construct validity. The common forms of assessment used in classification MCQs, short answer questions, and standardised essays are becoming useless due to their ability to test the already automated skills. Generative AI is used to generate new, original text that sounds like real texts, making it difficult to track and address the integrity of the academic process of plagiarism as shown in the case of Anthology (2024). The reactive policies like the outright ban or the use of AI detection technology have been proven to be not effective and not usually accurate, emerging research is indicating bias against non-native English speakers.   

This technological shock fits with the criticism of longstanding practise of standardised, high-stakes testing. Studies have revealed that the local reported achievement gains are not usually correlated to the national external examinations and this signifies a massive deficiency in external validity. Additionally, high-stakes testing has been subject to criticism due to the creation of discriminatory drop-out rates, and the ineffective delivery of equitable access to all students, which is also of concern to algorithmic bias. 

The Authentic Imperative: The Return on Higher-Order Skills

The solution to the crisis of AI is the systematic use of Authentic Assessment (AA), which is a strong pedagogical response to the crisis. AA involves application of theory to realistic practise in the profession, necessitating judgement and innovation, as well as integrating a set of knowledge and skills to negotiate intricate assignments. More importantly, real projects are repetitive and aimed to give diagnostic data and enhance the performance later as opposed to one shot traditional tests that only give a mark. The literature confirms that these real-life activities help to build the necessary skills in the 21st century, including creativity, problem-solving, and self-reflection that are failing to achieve by the standardised tests, but are yet to be increasingly required by the World Economic Forum (2025).   

It is the implementation problem of AA the vast amount of time and logistical work needed to provide continuous, comprehensive human feedback that AI is an augmentation tool that needs to be added. Assessment systems powered by AI can scan student work in real time which offers real immediate customised and adaptive feedback and less workload on teacher grading by as much as 70 percent. The ability allows scaled learning of the iterative genuine actions that are defensive by enormous expanse groups of learners in the UK.   

Ethical Governance: Reducing Algorithms Bias and improving Transparency

Massive implementation of AI in evaluation poses extreme situation-threatening risks, specifically in such areas as justice, responsibility, and the strengthening of current social injustices. The topic of algorithmic bias, which assumes that AI grading systems reproduce and spread prejudices in the form of trainers, is vital. It was demonstrated that automated essay marking programmes are prone to bias towards certain groups of individuals, and thus, they may mark essays by black students or those who do not speak English well with lower points, although the content of the essays might be equivalent. This risk of strengthening unfair treatment reminds us of the necessity of having a sound ethical rule (Ahmad et al., 2018).  

The mitigation should be planned and undertaken throughout the lifecycle of the AI model, including the conception stage to deployment, where a variety of data is collected, and it is properly pre-processed. Principles of OECD (2023) emphasise the importance of having trustworthy, transparent, and accountable systems. It is essential, and the fact that certain AI algorithms are black boxes makes it difficult to challenge the decisions in the assessment without an understanding of these algorithms, raising the issue of accountability and trust. The AI-EAE Model requires such frameworks as the P.A.T.H. (Principles, AI, Transparency, Humanity), where constant human control ensures human judgment is not overburdened, which guarantees the absence of an accountability gap and privacy rights.   

Pedagogical Reframing: Metacognition AI Literacy, and Learner Agency

The effective implementation of AI-improved assessment requires a radical pedagogical shift that will shift the process of instruction towards the development of critical thinking and metacognitive abilities. Policy agencies, such as UNESCO (2023), support the addition of AI literacy education to the population as it promotes the ability to assess AI products based on bias, accuracy, and limitations in the curriculum. This needs to be done by giving assignments a direct challenge to apply AI to come up with a response, and then demand the students to critique and refine their outputs by shifting cognitive labour to high-level analysis and refinement.   

The key in achieving this change is Metacognition (monitoring and control of own thoughts), which remains critical in the control of collaboration between humans and artificial intelligence. According to the description provided by Hutson and Plate (2023), collaborative AI metacognition lays specific attention on the strategic division of the labour force, inspecting the output produced by AI and thinking about using the tool in the most efficient way. The Self-Regulated Learning (SRL) model offered by Winne and Hadwin (1998) is especially topical because it outlines stages of task definition, goal setting, strategy enactment, and continuous metacognitive adaptation that are improved with the immediate feedback on a diagnostic result offered by AI systems. Moreover, the assessment design should also deal with the inclusion, whereby there should be a diversified variety of assessment techniques to meet the needs of neurodivergent students, which fits the principles of inclusive practice.

Methodology

Research Design

The proposed research design is an Interpretive Qualitative Approach, which is the best choice of studies that need to focus on the subjective experiences, beliefs, and complicated decision-making process of the UK educators about the adaptation of AI-EAE technologies. It is a multi-case, exploratory study approach that consists of a two-phase, pre-programme/ post-programme semistructured interview protocol. Such a longitudinal design is needed to determine a baseline of present perception (Phase I) and then record the changes in attitude and perceived feasibility (Phase II) upon the completion of interaction between stakeholders and theoretical models.   

Sampling

An intentional sample of 15-20 significant stakeholders in education will be obtained in one of the UK institutions of higher education that is actively engaged in managing generative AI assessment issues. The sample will consist of university faculty belonging to various disciplines, assessment adminis-trators (to comprehend the issues with policy and governance), and high-engagement learners (to measure how the learner agency and technostress are affected). The setting offers an enriched, current setting in analysing the attitudes of technology acceptance and assessment.   

Data Collection & Review

Information is going to be gathered using semi-structured interviews. The instrument will guarantee that all theoretical constructions of AI-EAE models will be covered without losing the flexibility required of probing deeply and exploring emergent themes.   

Phase I (Pre-Programme Interview): Will create a baseline of current assessment culture, current policy/use of GenAI, major ethical-related issues (e.g. bias, privacy) and the barriers that exist to the adoption of authentic assessment practises currently.

Phase II (Post-Programme Interview): Will be used to determine whether there is a perception change following the exposure of participants to the AI-EAE Theoretical Model. It will primarily be determined by the perceived feasibility, the judgement of the effectiveness of the proposed ethical guidance (P.A.T.H. framework) and perceived effect on the agency and metacognition of the student.   

Results and Discussion

Thematic Analysis (TA) is a definite and six-step procedure created by Braun and Clarke (2006) that will be used to examine the transcribed qualitative information in both stages: Familiarisation, Coding, Generating Themes, Reviewing Themes, Defining and Naming Themes and Writing Up. Such stringent procedure would make sure that themes are meaning-full interpretative narratives which directly answer research questions and make sense with the theoretical assumptions made.   

Actual Project – AI-Enhanced Authentic Evaluation (Expected Outcomes and Future Work) 

Assessment reform is guided by the AI-EAE Model which is the theoretical framework that explicitly bridges the gap between authentic assessment and ethical governance as well as pedagogical transformation. It postulates that the value of AI does not lie in substituting the human judge but rather in the customised, time-sensitive diagnostic responses that are necessary to upscale time variable authentic tasks as well as promote metacognitive SRL in learners.

Description (Thematic Findings that are Expected)

It is assumed that the thematic analysis will produce three major narratives:

Themes will confirm the perception of the faculty that restructured authentic tasks, which require critical judgments, do indeed assess skills that cannot be measured by modern GenAI, thereby restoring construct validity. Results will bring to focus significant technostress among the faculty and that the intellectual workload to redesign curriculum is also perceived to rise when AI is reducing the grading load, and how badly organised institutional assistance and training are needed.   

It will be shown that the stakeholder trust in the model is conditional after the manifested algorithmic transparency and the obligation to uphold the principles of human oversight of summative decisions (the principle of humanity) to reduce the risks of bias and loneliness.   

Implementation Plan 

Assessment (Interpretation Consequential validity)

The analytical discourse of the AI-EAE Model is based on its Consequential Validity, or whether its application has consequential effects that result in desirable education. In case the outcomes of the findings indicate that the model is generally supported by the stakeholders in terms of accepting ethical guardrails and pedagogical requirements, it will confirm the possibility of having an equitable and sustainable assessment strategy. On the other hand, an indication of consistent bias, or even thematic opposition to transparency would require an urgent change in the ethical governance construct of the model.

Cited Examples

The design of the model is based on the strategies established such as:

  • Asking students to create an AI response, and then comment on its accuracy and bias, makes the students shift cognitive labour to the analysis.
  • Having students provide complete histories of prompt and justifications of their choices of AI tools, exposing their thinking habits to monitoring and evaluation.   
  • With the help of AI, a student should role-play and practise a professional identity through simulating an interaction with a professional, making a more complex judgement.   

Reflection of the study 

Project Results and summary (Expected)

It is anticipated that the studies, along with the research, will prove the need and theoretical validity of the AI-EAE Model. It will establish that AI has been agreed on by the stakeholders as a prerequisite to the logistical scale of quality authentic version evaluation by using more effective feedback systems to assure the pedagogical worth of the model. Overall, thematic findings will be summarised as restoration of academic integrity is not possible when it is prohibited, but rather in a form of curriculum redesigning which places more emphasis on skills that AI cannot excel at such as critical analysis and complex thinking among humans.

Strengths and Limitations (Critical Analysis)

Assets: The major strength is that the model presents a systematic integration of three important areas, namely Assessment Theory, Ethical Governance and Pedagogical Practise, into one, consistent theoretical framework, filling an important gap in the literature. The qualitative design in its two phases presents requisite depth and context to the comprehension of the cultural and emotional barricades (e.g., technostress) the prescriptive policy is unable to measure. Limitations: The findings were qualitative data, thus restricted to the local area and cannot be generalised statistically. As a theoretical model, the model needs to undergo additional extensive external validation, and quantitative empirical research to quantify the model longitudinal effects on student achievement and equity outcomes. 

Research Reflections

The study indicates the urgent necessity of the UK educational policy to actively influence the application of AI instead of reactivity towards the changing possibilities. This predicted observation of the value greatly intensified on the value society holds on the principle of Humanity is a pivotal indication of the permanence of the human factor in education. Any evaluation system should supplement teacher-student mentoring relationships, alleviating the psychological consequence of loneliness and isolation that can be the result of excessive use of AI tools.   

Conclusion and Recommendations

The spread of GenAI has permanently questioned the authenticity of traditional evaluation. The AI-Enhanced Authentic Evaluation (AI-EAE) Theoretical Model offers the critical and strategic roadmap required to transform the educational institutions to a sustainable and ethical evaluation future-crisis of validity. The AI-EAE Model can be considered as the template of developing a new generation of critical thinkers and well-rounded professionals by combining genuine pedagogy and strict ethical management, as well as prioritising essential human skills. According to the AI-EAE Model and anticipated results, the following recommendations may be offered to the institutions of the UK higher education:

  • Institutions need to invest heavily in faculty professional development, which they should learn specifically to design more complex and higher-order authentic assessment problems that incorporate AI literacy and critical analysis.
  • Use a rubric such as P.A.T.H. to enforce an open approach to the use of AI, a constant human supervision of summative assessment, and a clear policy to hold people and students responsible in the event of an algorithm error.
  • Establish systematic procedure of auditing, based on the model lifecycle (Conception, Data Collection, Pre-processing) to actively identify and address the bias in any type of AI evaluation tool that is adopted so that it can ensure fair results for every student demographic.   
  • Revise assignments with a clear method of evaluating the student in terms of collaboration with AI (e.g., prompt histories, critiques) with an emphasis on metacognitive capabilities (planning, monitoring, reflection) and not only on the product.   

Author Contributions

S.A.: Conceptualization, review of literature, data analysis, and manuscript drafting. M.H.R.: Methodology design, technical validation, figure preparation, and critical manuscript revision.

Acknowledgment

We begin by sincerely thanking Allah for giving us the courage and chance to successfully finish this Research. We are incredibly appreciative of our family members, whose efforts and unwavering support have served as a continual source of inspiration. Additionally, we certify that the author and co-author were solely responsible for the conception, data gathering, analysis, and writing of this paper.

Conflicts of Interest

The authors declare no conflicts of interest related to this study.

Supplemental Materials:

| 4.00 KB

UniversePG does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted UniversePG a non-exclusive, worldwide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

Article References:

  1. Ahmad T, Uddin ME, and Hossain I. (2018). Evaluation of Microbial and Physiochemical Properties of Three Selected Lakes Water in Dhaka City, Bangladesh. Scholars Academic J. of Biosciences, 6(2), 230-238. https://doi.org/10.21276/sajb.2018.6.2.17 
  2. Amrein, A. & Berliner, D. (2002) ‘Reports: High Stakes Testing Hurts Education', The Examiner.
  3. Anthology. (2024) AI, Academic Integrity, and Authentic Assessment - An Ethical Path Forward for Education. https://www.anthology.com/paper/ai-academic-integrity-and-authentic-assessment-an-ethical-path-forward-for-education      
  4. Au, W. (2022) Unequal by design: High-stakes testing and the standardization of inequality. Routledge.
  5. Bekele, W.B. and Ago, F.Y. (2022) ‘Sample size for interview in qualitative research in social sciences: A guide to novice researchers', Research in Educational Policy and Management, 4(1), pp.42-50.
  6. Berry, C.A. and Brandeis Hill Marshall (2024) Mitigating Bias in Machine Learning. McGraw Hill Professional.
  7. Braun, V. and Clarke, V. (2006) ‘Using thematic analysis in psychology', Qualitative research in psychology, 3(2), pp.77-101.
  8. Braun, V. and Clarke, V. (2023) ‘Toward good practice in thematic analysis: Avoiding common problems and be (com) ing a knowing researcher', Inter j. of transgender health, 24(1), pp.1-6.
  9. Cabitza, F., Campagner, A., and Carobene, A. (2021) ‘The importance of being external. methodological insights for the external validation of machine learning models in medicine', Computer methods and programs in biomedicine, 208, p.106288.
  10. Chen, L., Chen, P. and Lin, Z. (2020) ‘Artificial intelligence in education: A review', IEEE access, 8, pp.75264-75278.
  11. Chivanga, S.Y. and Monyai, P.B. (2021) ‘Back to basics: Qualitative research methodology for beginners', Journal of Critical Reviews, 8(2), pp.11-17.
  12. Crawford, J., Allen, K.A., Pani, B. and Cowling, M. (2024) ‘When artificial intelligence substitutes humans in higher education: the cost of loneliness, student success, and retention', Studies in Higher Education, 49(5), pp.883-897.
  13. Eungoo, K.A.N.G. and Hwang, H.J. (2021) ‘Ethical conducts in qualitative research methodology: Participant observation and interview process', 연구윤리, 2(2), pp.5-10.
  14. Fernando, T. (2024) ‘Embracing Neurodiversity in Education: A Review of Inclusive Practices, Policies, and Pedagogies', SchoRes Journal of Education Research, 1(2).
  15. Foong, C.C., Bashir Ghouse, N.L., and Vadivelu, J. (2021) ‘A qualitative study on self-regulated learning among high performing medical students', BMC medical education, 21(1), p.320.
  16. Halaweh, M. and El Refae, G. (2024) ‘Examining the accuracy of AI detection software tools in education', In 2024 Fifth International Conference on Intelligent Data Science Technologies and Applications (IDSTA), pp. 186-190. IEEE.
  17. Hutson, J. and Plate, D. (2023) ‘Human-AI collaboration for smart education: reframing applied learning to support metacognition', IntechOpen.
  18. Ifelebuegu, A.O. (2023) ‘Rethinking online assessment strategies: Authenticity versus AI chatbot intervention', Journal of Applied Learning & Teaching, 6(2), pp.385-392.
  19. Jiang, Y., Hao, J., Fauss, M. and Li, C. (2024) ‘Detecting ChatGPT-generated essays in a large-scale writing assessment: Is there a bias against non-native English speakers?', Computers & Education, 217, p.105070.
  20. Mallillin LLD., and Caranguian RG. (2023). Management of educational system and practice: a guide to academic transformation, Br. J. Arts Humanit., 5(3), 131-141. https://doi.org/10.34104/bjah.02301310141 
  21. OECD. (2023) OECD AI Principles: Recommendations on Artificial Intelligence. https://www.oecd.org/en/topics/sub-issues/ai-principles.html  
  22. Raza, H. (2024) ‘Ai-driven assessment: Reliability, bias, and ethical implications', AI EDIFY Journal, 1(2), pp.36-47.
  23. Routray, R. and Khandelwal, K. (2024) ‘Artificial intelligence (AI) adoption: do Generation Z students feel technostress in deploying AI for completing courses of study at universities?', Asian Education and Development Studies, 13(5), pp.534-545.
  24. Sharma, A., Thakur, K., and Singh, K.J. (2023) ‘Designing inclusive learning environments: Universal Design for Learning in practice', In The impact and importance of instructional design in the educational landscape (pp. 24-61). IGI Global.
  25. Spector, J.M. and Ma, S. (2019) ‘Inquiry and critical thinking skills for the next generation: from artificial intelligence back to human intelligence', Smart Learning Environments, 6(1), pp.1-11.
  26. Stahl, B.C., Antoniou, J., and Warso, Z. (2023) ‘A systematic review of artificial intelligence impact assessments', Artificial Intelligence Review, 56(11), pp.12799-12831.
  27. Udan Kusmawan (2023) Shaping the Future Assessment: The Evolution of Assessment and its Impact on Student Learning and Success. Teaching and Learning Symposium 2023: The Future of Assessment Universiti Malaya, 22 November 2023, 1(1). https://www.researchgate.net/publication/374168376  
  28. UNESCO (2023) Artificial intelligence in education. [online] UNESCO. https://www.unesco.org/en/digital-education/artificial-intelligence   
  29. World Economic Forum. (2025) The Future of Jobs Report 2025. https://www.weforum.org/publications/series/future-of-jobs/ 

Article Info:

Academic Editor

Dr. Toansakul Tony Santiboon, Professor, Curtin University of Technology, Bentley, Australia

Received

November 21, 2025

Accepted

December 22, 2025

Published

December 30, 2025

Article DOI: 10.34104/ajpab.025.02680275

Corresponding author

Sathy Akter*

Master of Instructional Technology, Touro University of New York, Manhattan, NY

Cite this article

Akter S., and Rahman MH. (2025). Numerical simulation of geothermal energy-assisted recovery for heavy oil reservoir. Aust. J. Eng. Innov. Technol., 7(6), 268-275. https://doi.org/10.34104/ajpab.025.02680275  

Views
76
Download
4
Citations
Badge Img
Share