How your organization can have confidence in the opportunities AI brings

In Brief

Large language models (LLMs) are probabilistic rather than deterministic, meaning they carry uncertainty (e.g., hallucinations) with a range of possible outputs. A high degree of uncertainty is often intolerable, for example, in the healthcare and financial services sectors, which combined represent about 40% of GDP. There is a clear need for robust governance and stringent risk management to unlock the most valuable use cases of generative AI.

In the 1940s, a small number of scholars from a range of fields including psychology, mathematics, engineering, economics, and political science began to explore how to create an artificial brain. At the same time, philosophers, anthropologists, and scientists such as Isaac Asimov began to study the ethical implications of doing so. Today, public interest in ethical Artificial Intelligence (AI) is exploding, sparked by the astounding capabilities exhibited by large language models (LLMs) and generative AI (GenAI) techniques.

According to internet search trends, worldwide search interest in the closely related terms of ethical AI, responsible AI, and AI safety is all at record highs and still rising. However, one related search term is lagging the others in terms of public interest: trusted AI. Why is trust in AI as an ethical framing falling out of favor in the mind of the market?

How Musanid Can Help

Artificial Intelligence Consulting Services

Our consulting approach to the adoption of AI and intelligent automation is human-centered, pragmatic, outcomes-focused, and ethical.

Building Confidence: The Next Evolution of Ethical AI

The Musanid organization has a long history of working with AI and has recently announced a unifying platform that brings together human capabilities and AI to help clients transform their businesses through confident and responsible adoption of AI. Musanid.ai leverages leading-edge technology platforms and AI capabilities, with deep experience in strategy, transactions, transformation, risk, assurance, and tax, all augmented by a robust AI ecosystem.

As the technology matures and the demands of the market evolve, so do our language, methodologies, and approach to managing the myriad of risks emanating from deploying AI.

As a result of this sustained involvement with Musanid clients and AI technologies, we are shifting our focus away from building trust and toward building confidence. This new focus is better aligned to the current and emerging risks raised by leading AI techniques and clearly better aligned to widespread concerns relating to the ethics of AI. The motivation for this change is quite elemental; LLMs are probabilistic rather than deterministic, and therefore they bring new risks that must be managed.

In a probabilistic environment, the same prompt or input can generate different outputs, and even seemingly minor perturbations in the prompt, such as an extra comma, apostrophe, or alteration in spelling that a human reader would ignore, can push the outputs of the model into a completely different space, producing unwanted or unexpected model outcomes (also known as “hallucinations”). These new models are also vulnerable to new attack strategies, such as prompt injection attacks where specially designed prompts push the model outside of established safety boundaries. These are additional risks beyond those presented by earlier AI techniques.

Such variability can’t be tolerated in highly sensitive contexts, like hospitals or clinics for treatments and diagnostics or pharmaceutical processes such as determining doses. With a high degree of model uncertainty, LLMs need to be tightly monitored and controlled in high-risk sectors, applications, or in the automation of business-critical functions.

For example, imagine an AI Copilot deployed into emergency rooms to help doctors and nurses triage ailing patients, which, to be clear, such products are already in development. A high degree of model uncertainty has the potential to do irreversible harm in such an application.

In turn, this means the incredible technology, which has been compared to fire, electricity, and the iPhone in terms of socioeconomic impacts by major technology leaders, can’t yet be fully leveraged in key sectors like financial services or healthcare, which together represent around 40% of GDP. As such, many of the most valuable use cases in the economy remain unattainable until strong governance can ensure safety, robustness, and ethicality.

What is Required to Help Ensure Safety and Robustness?

Techniques are required to help us understand the reliability and accuracy of the model’s prediction and to measure or predict the degree of uncertainty in its outputs. Techniques are also emerging to further constrain the model outputs sufficiently so that, for example, medical experts can be confident that the model is accurate, and that the algorithmic guidance won’t do any harm to the patient.

While this is not a solved problem technically, one example is “fine-tuning,” where additional text is provided to strengthen the model’s expertise in a particular expert domain. Another is “reinforcement learning,” where humans in the loop provide the model with additional guidance on uncertain edge cases. Next, we can increase the “context window” to provide a source of “ground truth” for the model to reference check, and finally, there are “adversarial” approaches where a dueling model essentially reviews and fact checks the outputs of the first model to defined boundaries, such as the language in the United Nations (UN) Charter on human rights.

But none of these technical options in isolation or combined are adequate for enterprise adoption at scale. Enterprises need a solution-level approach that encompasses the full model life cycle, from data collection and engineering to model training and validation, to deployment and real-time risk monitoring. Only such a holistic view can give business leaders confidence that the risks are manageable, and that adopting GenAI will be value accretive.

The Big Challenge

In statistics, the concept of a “confidence interval” is a simple way to describe the degree of uncertainty for a given parameter, providing a range of possible estimates of something unknown. This is a much more appropriate framing for models producing probabilistic outputs. The growing public debate about ethical AI is therefore more about safety and responsibility. We believe it is possible to ensure AI is safe, and we believe it is the responsibility of businesses who deploy AI to do so.

Our teams are here to help you invest in AI with confidence, unlocking the most value-generating use cases to drive exponential growth. Our people-centric approach to AI helps enable the technology to augment your talent, driving efficiency and productivity gains across business functions. And world-leading multidisciplinary professionals in risk, strategy, technology, and transformation help to ensure adoption is aligned with your organization’s purpose, culture, values, and key stakeholders so that AI drives positive human impact.

The views reflected in this article are the views of the author and do not necessarily reflect the views of the global Musanid organization or its member firms.

Summary

A holistic, end-to-end approach to AI risk management is required through the model life cycle to instill the confidence required for enterprise adoption of GenAI.