June 9, 2026

Global Dialogue Briefing on Maintaining Human Oversight Over Civilian AI

Event
AI Governance
Multilateralism

By Kevin Kohler

Event
AI Governance
Multilateralism

In the lead up to the first Global Dialogue on AI Governance, the Simon Institute for Longterm Governance hosted a briefing series together with the Permanent Missions of Singapore, Kenya, and Norway. After two briefings (see here and here), the third convening focused on maintaining human oversight over civilian AI. This corresponds to the fourth thematic cluster suggested by the Co-Chairs based on 4e (“respect for and protection and promotion of human rights in the field of AI”) and 4f (“the transparency, accountability and robust human oversight of AI systems in a manner that complies with international law”) from A/RES/79/325.

The event featured welcome remarks from H.E. Tormod Endresen (Norway). Wafa Ben-Hassine (UN OHCHR) provided an input on human oversight requirements in the human rights system. Nick Spragg (member of safeguards at a frontier AI provider) provided an input on technical challenges and solutions in operationalizing human oversight. Subsequently, a fishbowl discussion invited all participants to share their ideas, concerns, and questions, ranging from user experience, to insurance, to cybersecurity. Max Stauffer (Simon Institute) closed the briefing series, thanking everyone for the robust discussions.

1. Human oversight is not optional

Ben-Hassine highlighted that human oversight is an existing legal requirement in a range of contexts. For example, Article 14 of the EU AI Act explicitly mandates human oversight for high-risk AI systems. Similarly, Article 8 of the Council of Europe Framework Convention on AI and Human Rights, Democracy and the Rule of Law requires that countries ensure risk-proportionate oversight. There is also Art. 22 GDPR, which gives individuals the right not to be subject to decisions based solely on automated processing that have significant effects, such as in employment.

Ben-Hassine also stressed that based on UN Guiding Principles on Business and Human Rights the responsibility to prevent and address human rights impacts extends to companies. First, this includes early and continuous human rights due diligence. Second, oversight must be built into design before deployment and during real-world use. Third, human rights due diligence includes ensuring that downstream users do not misuse AI in ways that violate human rights. Fourth, this includes transparency and documentation about how risks are managed. Fifth, particularly in light of emerging risks, systems must be tested against unexpected behaviour, including potential deceptive or misaligned outputs leading to discrimination. This is essential if oversight is to remain credible over time.

2. The challenge of scaling human oversight

The primary challenge for human oversight over civilian AI is the scale and speed of AI decision-making. AI can “flood the zone” with so many decisions and actions to review that humans cannot meaningfully keep up. As Spragg put it, discussions on meaningful human control emerged with an expectation that only a small number of consequential decisions need to be reviewed. However, given the deployment scale of AI systems, it will be impossible to review every single AI output. For context, according to Goldman Sachs, AIs already produce about 5 quadrillion tokens per month, and this is projected to increase further. For comparison, this number means AIs generate the same number of words as the entire English Wikipedia roughly every 3 seconds and roughly as much text per month as all of humanity speaks in a month.

A second challenge is that humans have a bias to perceive machine decisions as “neutral”, even if AI systems have various values and trade-offs embedded in them. As one participant noted, AI agents are also prone to simply make up answers and deceive, when asked for an explanation of what they have done. Combining decision volume with perceived neutrality, there is a risk that human review loses meaningful critical judgement and becomes the symbolic rubberstamping of AI decisions. Ben-Hassine identified this as the challenge of automation bias.

To provide orientation on what can be overseen, and highlight that this does not always have to be at the level of an individual decision, Spragg divided human oversight into three layers:

Pre-deployment assurance is the evaluation of the capabilities and safety of AI models before they are released by AI companies. This is important to ensure AI models are aligned and their misuse safeguards generally hold. Independent evaluators should be involved in conducting such evaluations. Currently, most pre-deployment assurance is concentrated in two jurisdictions, the US and the UK.
Runtime safeguards are not as much focused on the AI model itself as on user requests. These aim to identify when a user makes illegal or harmful requests with a classifier. Privacy-preserving tools like Clio capture a high-level summary of discussion topics and cluster them across multiple users. This can help AI service providers identify problematic use patterns.
Decision-level review is when a human reviews an AI output before making a decision. This will often not be the AI producer itself, but someone using an AI product within its terms of services for decision-support, such as for assessing credit risk, grading a paper, medical advice, or employment decisions.

3. Operationalizing human oversight requirements

As one participant stressed, there is no standalone “human right to human oversight”. Human oversight is a binding obligation instrumental to ensuring that human rights, such as the right to life, right to privacy, right to non-discrimination, and the right to due process and effective remedy, are upheld.

Consequently, we may want to understand human oversight as a substantive rather than a procedural requirement. In other words, rather than viewing human oversight in a civilian context as equal to having a human-in-the-loop for all decisions, what matters is that consequential decisions are compliant with human rights, including the possibility of review and remedy. This can mean focusing human oversight on layers where meaningful review is more feasible.

Spragg suggested that to operationalize human rights due diligence and oversight requirements, there is a need to provide more granular requirements:

At what layer do we place oversight? Pre-deployment assurance, runtime safeguards, decision-level review
At what cadence is there a review? Per release, per session, per decision
By whom? AI model producer, AI model deployer, regulator, user
Recourse and remedy should define what is owed to affected users
How do the requirements carry across borders?

Spragg particularly highlighted that the sharing of deployment intelligence of how users use and misuse AI models in the real world is still under-developed and -institutionalized. The “misuse surface” increases with every new model release and increased deployment scale. Yet, without due diligence standards and guidance from policymakers, companies in the AI industry may prefer not to monitor how their products are misused, since actively knowing about such misuse could expose them to legal liability.

4. What could the Global Dialogue contribute?

As H.E. Tormod Endresen highlighted, even if there is no political will for a comprehensive agreement on AI, it is very important to work with the multi-layered patchwork of existing frameworks and norms. AI agents are a qualitative step-change, and there is a risk of a gap between existing legal expectations of human oversight and reality, if we don’t work to operationalize human oversight in this new context.

With the New Delhi Frontier AI Impact Commitments, policymakers have expressed the expectation towards frontier AI model companies that they should share anonymized insights regarding how their AI systems are utilized in practice. However, this was mostly focused on the impact of AI on jobs, skills, productivity, and economic transformation. To go further, the Global Dialogue could grapple with the following questions:

Could the Global Dialogue’s thematic cluster on human rights and human oversight help define the expectation that frontier AI model companies share anonymized insights relevant to human rights due diligence?
Could the Global Dialogue encourage AI industry-coordination bodies to work towards common tools and standards on deployment intelligence?

Finally, as one participant reminded the audience, there is a civil society-led call for the UN to advance discussions on AI red lines, which touches on questions adjacent to human oversight. The Pope Leo XIV’s encyclical “Magnifica Humanitas”, which calls for preserving the human person in the age of AI, has intensified global normative discussions, with even the U.S. Vice President JD Vance asserting that “decisions over life and death must be made by humans and not machines.”

Could the Global Dialogue as a multistakeholder venue be a space for normative discussions on where, within the civilian sphere, we should draw boundaries to protect human oversight or human control?

This marks the end of this briefing series. Additional briefings are organized by United Nations University together with the Geneva Science Policy Interface and the Simon Institute based on briefs by the Secretary-General’s Scientific Advisory Board on AI Deception and Verification of Frontier AI. We also look forward to engaging with stakeholders around the first Global Dialogue on July 6 and 7.

Kevin Kohler

Global Dialogue Briefing on Maintaining Human Oversight Over Civilian AI

1. Human oversight is not optional

2. The challenge of scaling human oversight

3. Operationalizing human oversight requirements

4. What could the Global Dialogue contribute?

Related content