The Future of Assessment: A Theoretical Model for AI-Enhanced Authentic Evaluation in Education
The adoption of Generative Artificial Intelligence (GenAI) has led to a major crisis of credibility in traditional modes of assessment in UK education and at the same time makes the traditional and recall-based forms of assessment redundant by technological standards. The present proposal has created and qualitatively approached to develop the AI-Enhanced Authentic Evaluation (AI-EAE) Model which is the combination of a theoretical framework that requires a strategic change to complex and the real-world assessment procedures. The AI-EAE Model brings together Authentic Assessment (AA), high ethical lens of control (transparency, bias counteract), and the necessary pedagogical shift - concerned about metacognition and critical AI literacy in a system. The research responses to the great need for a unified model of practice involving sensitive balances of assessment validity and ethical responsibility and institutional capacity to avoid supporting perpetuating institutional inequities. A qualitative, exploratory approach utilizes a two-phase (pre-program/post-program) semi-structured interview guide to members of the key stakeholders in the UK education sector (faculty, administrators, and students). A thematic analysis method of Braun and Clarke is used in the analysis of the data. The study will focus critically on awareness of in-depth areas of ethical friction such as algorithmic bias and the possibility of curriculum redesign to promote Collaborative AI Metacognition. The results will offer an ethnically based institutional policy outline where assessment would be ethical and develop the necessary 21st-century competencies.
Pervasive adoption of Generative Artificial Intelligence (GenAI) has elicited a crisis of construct validity in established and standardised roughly assessment activities in the UK higher education sector. Activities that were in the past utilised in assessing learning (essays, multiple choice questions (MCQs) to name a few) are now able to be performed with machine intelligence effectively thus in principle revealing the technological irrelevance of such practises. The issue is two-fold: on the one hand, there is a significant decline in the validity of these testing tools to quantify the desired human cognitive tools (critical thinking, judgement); and, on the other, it is also based on the fact that the reliance on reactive measures, i.e. AI detection software, is invalidated due to its considerable non-reliability and the possibility of bias in relation to non-native speakers of the English language. This will be accompanied by the policy of paradigm shift in the assessment philosophy (Mallillin and Caranguian, 2023).
The fundamental intention of the research is to establish and qualitatively proving an intensive theoretical framework the AI-Enhanced Authentic Evalu-ation an AI EAE model, which scientifically considers the advantages of high-fidelity authentic assessment as well as stringent ethical AI control combined with obligatory pedagogical changes in the sphere of metacognition and AI literacy. The research expects to shift institutional response to proactive rather than reactive policing that is based on ethical assessment.
Research Question
Significance of the Study
The study is an extremely major one because it provides a theoretically informed and systematic blueprint of assessment reform within the UK higher education that will help the institutions combat the present validity crisis. Offering deep qualitative information on the experience of the stakeholders, the study gives practical information on what the profession should do and what policy frameworks must be in place to happily utilise AI as an augmentation tool and not a substitute for human judgments. The strategic change directs education towards the increasing industry expectations of high-level (21st century) high-order skills that are above the measure of the traditional tests.
Assumptions of the Study
The research presupposes that the existing capacity of GenAI has already damaged the authenticity of conventional evaluations (essays, MCQs) irreparably and that universities in the UK are devoted to the implementation of technologies that would guarantee the high standards of academic integrity and fair results. It also presupposes that data of qualitative interviews of a chosen sample of academic stakeholders will contribute a strong basis to theoretical validation of the suggested model.
Limitations and Delimitations
Limitations: The first weakness is the fact that qualitative results are not generalisable, and the authors used self-reported images of the stakeholders of one particular institution, which restricts the ability to statistically extrapolate the findings to the wider UK context. Limitations: The study is firmly limited to a qualitative, exploratory research of the theoretical scope and perceived ethical issue of the AI-EAE Model. It lacks an empirical, quantitative estimate of the effect of the model upon actual student performance or measures of assessment, which would involve a large-scale timespan follow-up study.
Key Concepts
Crisis of Validity: Evaluation of Obsolescence in the AI Era
The original literature attests that the challenge of GenAI is the threat of construct validity. The common forms of assessment used in classification MCQs, short answer questions, and standardised essays are becoming useless due to their ability to test the already automated skills. Generative AI is used to generate new, original text that sounds like real texts, making it difficult to track and address the integrity of the academic process of plagiarism as shown in the case of Anthology (2024). The reactive policies like the outright ban or the use of AI detection technology have been proven to be not effective and not usually accurate, emerging research is indicating bias against non-native English speakers.
This technological shock fits with the criticism of longstanding practise of standardised, high-stakes testing. Studies have revealed that the local reported achievement gains are not usually correlated to the national external examinations and this signifies a massive deficiency in external validity. Additionally, high-stakes testing has been subject to criticism due to the creation of discriminatory drop-out rates, and the ineffective delivery of equitable access to all students, which is also of concern to algorithmic bias.
The Authentic Imperative: The Return on Higher-Order Skills
The solution to the crisis of AI is the systematic use of Authentic Assessment (AA), which is a strong pedagogical response to the crisis. AA involves application of theory to realistic practise in the profession, necessitating judgement and innovation, as well as integrating a set of knowledge and skills to negotiate intricate assignments. More importantly, real projects are repetitive and aimed to give diagnostic data and enhance the performance later as opposed to one shot traditional tests that only give a mark. The literature confirms that these real-life activities help to build the necessary skills in the 21st century, including creativity, problem-solving, and self-reflection that are failing to achieve by the standardised tests, but are yet to be increasingly required by the World Economic Forum (2025).
It is the implementation problem of AA the vast amount of time and logistical work needed to provide continuous, comprehensive human feedback that AI is an augmentation tool that needs to be added. Assessment systems powered by AI can scan student work in real time which offers real immediate customised and adaptive feedback and less workload on teacher grading by as much as 70 percent. The ability allows scaled learning of the iterative genuine actions that are defensive by enormous expanse groups of learners in the UK.
Ethical Governance: Reducing Algorithms Bias and improving Transparency
Massive implementation of AI in evaluation poses extreme situation-threatening risks, specifically in such areas as justice, responsibility, and the strengthening of current social injustices. The topic of algorithmic bias, which assumes that AI grading systems reproduce and spread prejudices in the form of trainers, is vital. It was demonstrated that automated essay marking programmes are prone to bias towards certain groups of individuals, and thus, they may mark essays by black students or those who do not speak English well with lower points, although the content of the essays might be equivalent. This risk of strengthening unfair treatment reminds us of the necessity of having a sound ethical rule (Ahmad et al., 2018).
The mitigation should be planned and undertaken throughout the lifecycle of the AI model, including the conception stage to deployment, where a variety of data is collected, and it is properly pre-processed. Principles of OECD (2023) emphasise the importance of having trustworthy, transparent, and accountable systems. It is essential, and the fact that certain AI algorithms are black boxes makes it difficult to challenge the decisions in the assessment without an understanding of these algorithms, raising the issue of accountability and trust. The AI-EAE Model requires such frameworks as the P.A.T.H. (Principles, AI, Transparency, Humanity), where constant human control ensures human judgment is not overburdened, which guarantees the absence of an accountability gap and privacy rights.
Pedagogical Reframing: Metacognition AI Literacy, and Learner Agency
The effective implementation of AI-improved assessment requires a radical pedagogical shift that will shift the process of instruction towards the development of critical thinking and metacognitive abilities. Policy agencies, such as UNESCO (2023), support the addition of AI literacy education to the population as it promotes the ability to assess AI products based on bias, accuracy, and limitations in the curriculum. This needs to be done by giving assignments a direct challenge to apply AI to come up with a response, and then demand the students to critique and refine their outputs by shifting cognitive labour to high-level analysis and refinement.
The key in achieving this change is Metacognition (monitoring and control of own thoughts), which remains critical in the control of collaboration between humans and artificial intelligence. According to the description provided by Hutson and Plate (2023), collaborative AI metacognition lays specific attention on the strategic division of the labour force, inspecting the output produced by AI and thinking about using the tool in the most efficient way. The Self-Regulated Learning (SRL) model offered by Winne and Hadwin (1998) is especially topical because it outlines stages of task definition, goal setting, strategy enactment, and continuous metacognitive adaptation that are improved with the immediate feedback on a diagnostic result offered by AI systems. Moreover, the assessment design should also deal with the inclusion, whereby there should be a diversified variety of assessment techniques to meet the needs of neurodivergent students, which fits the principles of inclusive practice.
Research Design
The proposed research design is an Interpretive Qualitative Approach, which is the best choice of studies that need to focus on the subjective experiences, beliefs, and complicated decision-making process of the UK educators about the adaptation of AI-EAE technologies. It is a multi-case, exploratory study approach that consists of a two-phase, pre-programme/ post-programme semistructured interview protocol. Such a longitudinal design is needed to determine a baseline of present perception (Phase I) and then record the changes in attitude and perceived feasibility (Phase II) upon the completion of interaction between stakeholders and theoretical models.
Sampling
An intentional sample of 15-20 significant stakeholders in education will be obtained in one of the UK institutions of higher education that is actively engaged in managing generative AI assessment issues. The sample will consist of university faculty belonging to various disciplines, assessment adminis-trators (to comprehend the issues with policy and governance), and high-engagement learners (to measure how the learner agency and technostress are affected). The setting offers an enriched, current setting in analysing the attitudes of technology acceptance and assessment.
Data Collection & Review
Information is going to be gathered using semi-structured interviews. The instrument will guarantee that all theoretical constructions of AI-EAE models will be covered without losing the flexibility required of probing deeply and exploring emergent themes.
Phase I (Pre-Programme Interview): Will create a baseline of current assessment culture, current policy/use of GenAI, major ethical-related issues (e.g. bias, privacy) and the barriers that exist to the adoption of authentic assessment practises currently.
Phase II (Post-Programme Interview): Will be used to determine whether there is a perception change following the exposure of participants to the AI-EAE Theoretical Model. It will primarily be determined by the perceived feasibility, the judgement of the effectiveness of the proposed ethical guidance (P.A.T.H. framework) and perceived effect on the agency and metacognition of the student.
Thematic Analysis (TA) is a definite and six-step procedure created by Braun and Clarke (2006) that will be used to examine the transcribed qualitative information in both stages: Familiarisation, Coding, Generating Themes, Reviewing Themes, Defining and Naming Themes and Writing Up. Such stringent procedure would make sure that themes are meaning-full interpretative narratives which directly answer research questions and make sense with the theoretical assumptions made.
Actual Project – AI-Enhanced Authentic Evaluation (Expected Outcomes and Future Work)
Assessment reform is guided by the AI-EAE Model which is the theoretical framework that explicitly bridges the gap between authentic assessment and ethical governance as well as pedagogical transformation. It postulates that the value of AI does not lie in substituting the human judge but rather in the customised, time-sensitive diagnostic responses that are necessary to upscale time variable authentic tasks as well as promote metacognitive SRL in learners.
Description (Thematic Findings that are Expected)
It is assumed that the thematic analysis will produce three major narratives:
Themes will confirm the perception of the faculty that restructured authentic tasks, which require critical judgments, do indeed assess skills that cannot be measured by modern GenAI, thereby restoring construct validity. Results will bring to focus significant technostress among the faculty and that the intellectual workload to redesign curriculum is also perceived to rise when AI is reducing the grading load, and how badly organised institutional assistance and training are needed.
It will be shown that the stakeholder trust in the model is conditional after the manifested algorithmic transparency and the obligation to uphold the principles of human oversight of summative decisions (the principle of humanity) to reduce the risks of bias and loneliness.
Implementation Plan
Assessment (Interpretation Consequential validity)
The analytical discourse of the AI-EAE Model is based on its Consequential Validity, or whether its application has consequential effects that result in desirable education. In case the outcomes of the findings indicate that the model is generally supported by the stakeholders in terms of accepting ethical guardrails and pedagogical requirements, it will confirm the possibility of having an equitable and sustainable assessment strategy. On the other hand, an indication of consistent bias, or even thematic opposition to transparency would require an urgent change in the ethical governance construct of the model.
Cited Examples
The design of the model is based on the strategies established such as:
Reflection of the study
Project Results and summary (Expected)
It is anticipated that the studies, along with the research, will prove the need and theoretical validity of the AI-EAE Model. It will establish that AI has been agreed on by the stakeholders as a prerequisite to the logistical scale of quality authentic version evaluation by using more effective feedback systems to assure the pedagogical worth of the model. Overall, thematic findings will be summarised as restoration of academic integrity is not possible when it is prohibited, but rather in a form of curriculum redesigning which places more emphasis on skills that AI cannot excel at such as critical analysis and complex thinking among humans.
Strengths and Limitations (Critical Analysis)
Assets: The major strength is that the model presents a systematic integration of three important areas, namely Assessment Theory, Ethical Governance and Pedagogical Practise, into one, consistent theoretical framework, filling an important gap in the literature. The qualitative design in its two phases presents requisite depth and context to the comprehension of the cultural and emotional barricades (e.g., technostress) the prescriptive policy is unable to measure. Limitations: The findings were qualitative data, thus restricted to the local area and cannot be generalised statistically. As a theoretical model, the model needs to undergo additional extensive external validation, and quantitative empirical research to quantify the model longitudinal effects on student achievement and equity outcomes.
Research Reflections
The study indicates the urgent necessity of the UK educational policy to actively influence the application of AI instead of reactivity towards the changing possibilities. This predicted observation of the value greatly intensified on the value society holds on the principle of Humanity is a pivotal indication of the permanence of the human factor in education. Any evaluation system should supplement teacher-student mentoring relationships, alleviating the psychological consequence of loneliness and isolation that can be the result of excessive use of AI tools.
The spread of GenAI has permanently questioned the authenticity of traditional evaluation. The AI-Enhanced Authentic Evaluation (AI-EAE) Theoretical Model offers the critical and strategic roadmap required to transform the educational institutions to a sustainable and ethical evaluation future-crisis of validity. The AI-EAE Model can be considered as the template of developing a new generation of critical thinkers and well-rounded professionals by combining genuine pedagogy and strict ethical management, as well as prioritising essential human skills. According to the AI-EAE Model and anticipated results, the following recommendations may be offered to the institutions of the UK higher education:
S.A.: Conceptualization, review of literature, data analysis, and manuscript drafting. M.H.R.: Methodology design, technical validation, figure preparation, and critical manuscript revision.
We begin by sincerely thanking Allah for giving us the courage and chance to successfully finish this Research. We are incredibly appreciative of our family members, whose efforts and unwavering support have served as a continual source of inspiration. Additionally, we certify that the author and co-author were solely responsible for the conception, data gathering, analysis, and writing of this paper.
The authors declare no conflicts of interest related to this study.
UniversePG does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted UniversePG a non-exclusive, worldwide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.
Academic Editor
Dr. Toansakul Tony Santiboon, Professor, Curtin University of Technology, Bentley, Australia
Master of Instructional Technology, Touro University of New York, Manhattan, NY
Akter S., and Rahman MH. (2025). Numerical simulation of geothermal energy-assisted recovery for heavy oil reservoir. Aust. J. Eng. Innov. Technol., 7(6), 268-275. https://doi.org/10.34104/ajpab.025.02680275