Nature General Better data sets won’t solve the problem — we need AI for Africa to be developed in Africa

Better data sets won’t solve the problem — we need AI for Africa to be developed in Africa

Better data sets won’t solve the problem — we need AI for Africa to be developed in Africa post thumbnail image


In May 2023 Sam Altman, chief executive of OpenAI in San Francisco, California, embarked on a 17-city world tour to promote artificial intelligence (AI) tools. One of the stops in the first week of the tour was Lagos, Nigeria. Yet, research assessing the performance of OpenAI’s chatbot ChatGPT on a data set of 670 languages shows that African languages have the least support1. The large language model GPT-4, which underlies ChatGPT, recognizes sentences written in Hausa, Nigeria’s most widely spoken language, only 10–20% of the time.

As a computer scientist and former machine-learning engineer at Google AI in Accra, I have long known about the limitations of importing AI tools devoid of local context into Africa. And we need to give African computer scientists the opportunity to develop home-grown solutions.

For instance, in 2018, when my colleagues and I set out to track changes in the built environment of South Africa’s historically Black townships, we quickly discovered the limitations of AI models commonly used to detect features such as houses and street patterns in aerial images. Because the post-Apartheid constitution prioritizes uplifting disenfranchised communities, our project aimed to assess whether improvements in well-being were visible in satellite imagery. Unfortunately, available AI models — trained on Western cities, which often have grid-like layouts — struggled to adapt to the nation’s unique urban landscapes.

It took us four years to develop an AI model tailored to the local context. Meanwhile, Western researchers who had access to similar satellite and census data focused on using night-time light levels to estimate poverty rates in several African countries2 — an approach that is doomed to fail in South African townships. There, well-lit streets, a legacy of Apartheid-era discriminatory policing, could easily be misread as signalling economically prosperous urban zones.

Our challenging experience with ‘state-of-the-art’ AI models highlights the importance of local context and lived experience. That’s why African-built small language models, such as Lesan AI — a language translation and transcription tool — can match and even outperform Western counterparts on tasks such as speech-to-text transcription. To build the model, Lesan’s co-founder, Asmelash Hadgu in Berlin, hired fluent speakers of Tigrinya and Amharic to create unique data sets and then used those to train the AI. As a technologist who speaks both languages, Hadgu was able to build a rich data set by focusing on the most descriptive parts of his language.

Even if big technology firms manage to replicate this strategy, the underlying problem of diverging priorities remains. AI technologies are biased towards the main task for which they are built. Like humans, language models can experience cognitive overload — for instance, when asked to complete a mathematical puzzle while remembering to communicate in Swahili. This results in reduced performance, which cannot be fixed by just adding more data.

It is this difference in prioritization that is the missing ingredient in current discussions about making AI more inclusive. Which problems people focus on, what we regard as informative data and what types of failure are considered acceptable are all human choices.

For example, during the COVID-19 pandemic, Elaine Nsoesie, a global-health researcher at Boston University, Massachusetts, used her knowledge of how Cameroonians discuss illnesses to build a system for tracking influenza-like outbreaks using search-engine and social-media data3. Although the data were publicly available, contextual knowledge guided key decisions on which terms and signals in them were useful. The lesson that African researchers must learn as AI becomes more intimately woven into people’s lives is that we cannot expect those who do not understand how we live to build tools for us.

Currently, around 1% of global venture capital flows to Africa. Even mission-driven philanthropic organizations have invested most of their funds in Western AI companies. This needs to change. African businesses should prioritize using locally developed products, and the public sector must adopt a ‘buy local’ approach where possible.

The open-source community should focus on creating smaller, task-specific AI systems that developers in Africa can more easily adapt to their local needs. Bigger isn’t always better. A thriving open-source ecosystem could empower start-up firms in Africa to compete with large, general-purpose solutions designed by big technology companies.

The continent’s AI research community needs to develop a coherent vision for AI regulation. Only then can Africa’s collective voice be better represented in international AI-governance forums. Otherwise, decisions about AI’s purpose, functionality and safeguards will be made by wealthy nations, even though the impacts are felt by people globally.

For too long, African data sets have been labelled as too complex or irregular, and our languages have been deemed too under-resourced for large AI systems to learn effectively. Yet groups across Africa create valuable technology with these limited resources. It’s time to support them.

Competing Interests

The author declares no competing interests.


Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post