Global Dialogue Briefing on Capacity Building for AI Governance
-
Event
-
AI Governance
-
Multilateralism
In the lead up to the Global Dialogue on AI Governance, the Simon Institute for Longterm Governance is hosting a briefing series together with the Permanent Missions of Singapore, Kenya, and Norway. The second briefing has focused on capacity building for AI governance, corresponding to the pre-defined Global Dialogue topic 4b (capacity gaps), with relevance to topics 4c (social, economic, cultural, and linguistic implications of AI) and 4d (interoperability of AI governance).
The event featured welcome remarks from H.E. James Ndirangu Waweru (Kenya) and H.E. Tormod Cappelen Endresen (Norway). Next, Cecil Abungu (Cambridge), who is the coordinator of ILINA, an African-led AI research program, presented on the needs and challenges of capacity building for AI governance. An open discussion in a fishbowl format concluded the event.
The case for AI governance capacity building
a) AI testing as a potential capacity gap
There are many AI capacity building initiatives. In the multilateral system these range from the UN Global Fund for AI with a funding target of $3 billion to the World Bank’s Digital Development Partnership, the ITU’s Partner2Connect Digital Coalition or the UNDP’s AI Hub for Sustainable Development. Bilateral and minilateral programs include the EU’s Global Gateway, China’s Digital Silk Road and Capacity-Building Action Plan, the UK-Canada AI for Development program, the Swiss International Computation and AI Network, or the American AI Exports Program. The main focus of these initiatives is to build the infrastructure and resources to develop and deploy AI systems.
However, with maybe the exception of Germany’s support for building national AI strategies, there are limited capacity building efforts to build the skills and infrastructure needed for AI governance. Specifically, the ability to evaluate AI systems and to conduct post-market monitoring of impacts is a critical enabler of well thought through AI governance. Evaluation tests for capabilities (what an AI model is able to do, including dangerous actions), propensity (what it does by default when given choices) and control (whether guardrails can be bypassed by someone actively trying to misuse th AI model) – critical information to figure out whether and how to deploy given AI systems. Currently, only about a dozen countries have dedicated AI testing institutes. The International Network for Advanced AI Measurement, Evaluation and Science has ten members.
b) Different levels of state capacity are likely to persist
A recurring pattern across different types of divides is that absolute access gaps can be closed over time. In contrast, relative capacity gaps are likely to persist as these are largely a consequence of broader economic inequality. As one speaker noted, if a country needs to choose between vaccines and building an advanced AI testing facility, it will likely choose the former. As such, it would be unrealistic to assume that every country will have the same amount of state capacity to test AI and govern its deployment.
There are established methods from domains, such as medicine or civil aviation, for how to manage state capacity differentials for testing, including:
- Maturity frameworks define progressive levels of testing capacity with specific indicators, enabling countries to self-assess and plan their development. Examples include the WHO Global Benchmarking Tool for medicine regulators and ICAO’s Universal Safety Oversight Audit Programme for aviation.
- Regulatory reliance and recognition allows a national authority to take into account assessments by a trusted reference authority while retaining sovereignty over the final decision. In pharmaceuticals, this is codified in WHO Good Reliance Practices. In aviation, bilateral agreements enable tiered mutual recognition.
- Third-party and delegated testing enables private entities to perform testing under government oversight. Contract Research Organizations do this in pharmaceuticals, Designated Engineering Representatives in aviation.
- Regional pooling allows groups of states to share regulatory functions they could not sustain individually. In civil aviation, Regional Safety Oversight Organizations pool regulatory functions across smaller states. In pharmaceutical regulation, Regional Centres of Regulatory Excellence across Africa provide training and joint assessment.
c) How a differentiated AI testing ecosystem might look like
In the absence of a new international AI organization that would centralize AI testing, we should expect an ecosystem with differentiated AI testing capabilities. One way to approach the division of labor in a healthy AI testing ecosystem is across a testing lifecycle with distinct phases that may require different capabilities and geographic distribution.
- Internal pre-release testing: Requires access to model weights and large amounts of computing power. Such testing may only be feasible where AI companies physically locate their training infrastructure and R&D teams.
- Pre-deployment evaluation: Major markets can demand this as a condition of market access. Evaluation at this stage may include interpreting and supplementing rather than replicating frontier testing.
- Contextual and complementary testing: Smaller markets may focus on aspects such as locally-developed models, local language performance, and cultural context.
- Post-deployment monitoring: Could happen everywhere AI is used. In pharmaceuticals, 170+ countries contribute to the WHO’s adverse reaction database.
A key insight shared by Abungu was that not all AI risks are equally context-dependent. Misalignment risks – whether a model pursues goals consistent with human intent — are largely shared across deployment contexts, since the underlying failure modes are properties of training and architecture rather than of the environment into which the model is deployed. For these risks, countries with lower testing capacity may be able to rely on assessments from institutions that already have significant capabilities, such as the UK AI Security Institute.
In contrast, misuse risks and to some extent robustness to adversarial inputs, can be deeply context-dependent. They depend on local languages, cultural patterns, economic structures, and social dynamics. For these risks, reliance on external assessments alone is not adequate and local testing capacity is likely more effective.
What does contextual risk mean concretely?
In Myanmar, a UN fact-finding mission found social media played a significant role in fueling violence against the Rohingya due to virtually no Burmese-language content moderation. Similarly, tests for manipulative AI content in English may miss patterns in low-resource languages or different cultural formats like SMS-style voter mobilization.
Abungu highlighted M-Pesa fraud as a concrete example of AI misuse risks with local context. Mobile money transactions account for over half of Kenya’s GDP, and M-Pesa controls 99% of this market. The result is a manipulation surface that does not exist in many Western economies. The existing M-Pesa fraud ecosystem runs almost entirely on social engineering using local languages and pretending to be official Safaricom representatives.
Maintaining interoperability
The fishbowl discussion surfaced a key tension between localized testing needs and the risk of regulatory fragmentation. Large AI companies have a legitimate interest in limiting fragmentation from divergent requirements across jurisdictions, while many countries have a legitimate interest in ensuring risks specific to their context are addressed.
Capacity building may help to bridge this tension. Countries with skills to interpret test results and stress-test frameworks locally are more likely to align with interoperable approaches, rather than defaulting to passive deference or reactive, fragmented regulation.
d) What capacity building can contribute to a healthy ecosystem
Participants identified several types of support that capacity building could provide to enable this differentiated ecosystem:
- Institutional twinning: Pairing established AI testing institutions with emerging ones, as in pharmaceuticals (WHO’s Coalition of Interested Parties Network pairing 33 donors with developing-country regulators) or in biosafety (World Organisation for Animal Health laboratory twinning).
- Compute: Some forms of AI evaluation require significant computational power. Providing affordable access to compute through programs like ICAIN could help to lower barriers.
- Financial support: The proposed UN Global Fund for AI could provide (co-)funding for emerging national and regional AI testing bodies, supporting them through the initial years until they can be sustained independently.
- Common frameworks: Maturity frameworks, shared evaluation methodologies, and interoperable reporting standards can provide a common language and reduce the cost of building governance capacity from scratch.
What the Global Dialogue could contribute to AI governance capacity building
Many capacity building efforts exist, but they tend to neglect the skills and infrastructure needed for AI governance, and specifically on AI testing. The UN Secretary General’s Report on Innovative voluntary financing options for AI capacity-building explicitly mentions equipping “institutions with the knowledge, skills and tools required to lead holistic national AI planning processes and govern AI responsibly” as one capacity gap. Similarly, last year’s Governance Day at the AI for Good Summit included ten principles for AI governance, among them “transparency as a cornerstone of trust” and “capacity for all, not just a few.” As UNCTAD notes, a “lack of systematic evidence from developing countries limits their capacity to design effective interventions.”
These gaps make capacity building for AI governance a natural fit for the Global Dialogue. Below are exploratory questions that might be addressed in the Global Dialogue:
Where is regulatory reliance sufficient, and where is local AI governance capacity essential? For risks that are properties of the model itself, such as deceptive behavior, reliance on leading testing institutions may be adequate. For context-dependent risks, such as social engineering in local languages, it may not be. Could the Global Dialogue help establish consensus on this question?
Could institutional twinning accelerate the development of AI governance capacity? What would it take to broker such arrangements at scale, and what role could the UN system play?
Is there a need for a maturity framework for AI governance readiness? Which institutions could credibly develop one that is useful rather than burdensome for resource-constrained countries?
Should the proposed Global AI Fund earmark a dedicated portion of funding for AI testing capacity? The state capacity to keep up with the pace of AI capabilities, opportunities, and risks requires both compute and skills, and thus funding.