Evaluating the Role of AI in Aviation Assessment (Driscoll’s Reflective Model)


What?

In this unit’s discussion, I reflected on my direct experience with the introduction of AI tools, particularly large language models, into the aviation maintenance exam department. These tools were intended to support the rapid generation of large numbers of summative exam questions. Initially, the output appeared promising, producing fluent and well-structured text. However, once deployed, the generated questions revealed significant issues, including references to non-existent figures, fabricated content, and inconsistent difficulty levels.

This created unexpected challenges for both Subject Matter Experts (SMEs) and students. For non-native English speakers, overly complex or poorly structured language amplified confusion. The workload, rather than being reduced, shifted toward reviewing, correcting, and rewriting a large volume of mediocre questions. What began as an efficiency solution ultimately resulted in more time spent ensuring quality and compliance with aviation standards.

So What?

This experience demonstrated the gap between the perceived capabilities of AI and its actual reliability in high-stakes educational contexts. As Hutson (2021) points out, large language models are “a mouth without a brain,” capable of generating plausible text without true understanding. In practice, this lack of comprehension manifested in the production of flawed exam content that required human intervention to correct.

Bender et al. (2021) argue that language models reproduce patterns without understanding, which aligns with my observation of the AI’s ability to mimic question formats without adhering to the logic or referencing accuracy expected in aviation assessments. Carlini et al. (2021) also warn of the risks associated with data leakage and unreliable outputs. In this case, the issue was not data security but the reliability and validity of the generated content.

Reflecting on this, I realised that the implementation failed not because the technology itself was inherently flawed, but because it was treated as a replacement rather than a tool to assist SMEs. The absence of robust review workflows and clear usage boundaries amplified the consequences of AI’s limitations. This has reinforced my understanding that in regulated fields like aviation, quality assurance and domain expertise remain irreplaceable.

Now What?

Moving forward, I intend to advocate for a structured integration of AI tools in exam development rather than outright replacement. This involves defining clear boundaries for where AI can add value, such as generating draft question stems or providing linguistic variations, while ensuring SMEs remain responsible for accuracy, referencing, and alignment with learning outcomes.

I also plan to develop a quality assurance workflow that integrates AI outputs into existing SME review processes, rather than bypassing them. This will include checklists for content validation and language calibration, especially for non-native English learners. On a personal level, this reflection has strengthened my resolve to approach technological tools critically and strategically, recognising both their potential and their limitations.

In future implementations, I will ensure that AI tools are positioned as assistants rather than substitutes, especially in high-stakes contexts where errors can compromise educational standards and student outcomes.

References

  • Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021. On the dangers of stochastic parrots: can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623.
  • Carlini, N. et al., 2021. Extracting Training Data from Large Language Models. USENIX Security Symposium, pp. 2633–2650.
  • Hutson, M., 2021. Robo-writers: the rise and risks of language-generating AI. Nature, 591(7848), pp. 22–25.
  • Driscoll, J., 2007. Practising Clinical Supervision: A Reflective Approach for Healthcare Professionals. 2nd ed. Edinburgh: Elsevier.