Long Read · Language · AI · Culture
Somewhere in Lagos, a first-generation entrepreneur tries to use an AI assistant to file her
tax returns. The app doesn't understand Yoruba. She types broken English, gets broken answers.
She closes the app and walks to the tax office instead.
This is not a fringe story. It is the daily reality for hundreds of millions of people.
And it is the exact problem that AI built only for English-speaking, text-literate users
will never solve. The question is not whether this matters.
The question is who decides to do something about it.
Chapter One
Why the problem is deeper than it looks
When AI researchers talk about "data," they are really talking about power. The more data a language has, the smarter, more fluent, and more commercially viable the AI built on it becomes. English benefits from billions of web pages, decades of digitized books, and trillions of tokens scraped from the modern internet. Swahili has a fraction of that. Hausa less. Dholuo, Kamba, Tigrinya; almost nothing at all.
This is the data desert. Not a metaphor, a measurable, documented reality. AI models trained on English-dominated data don't merely fail to speak Hausa; they fail to understand Hausa users even when those users try to use English, because the cultural context, the idioms, the conceptual frameworks underlying the words are invisible to the model.
The economic cost is not theoretical. When AI tools cannot serve local populations, entire sectors; agriculture extension services, health diagnostics, financial literacy tools, civic services; remain out of reach. The hidden cost of ignoring local languages is not measured in lost app downloads. It is measured in lives lived at the margins of the digital economy.
"A model that has never seen your world cannot understand your words, even when you use someone else's language to describe it."
Chapter Two
Why voice AI will bypass text AI entirely
There is a pattern that keeps repeating itself, and it is worth paying attention to. In the 1990s, economists were confident: building fixed-line telephone infrastructure would take decades. What actually happened is one of the quietly remarkable things in modern economic history. Mobile phones reached places landlines never went. Mobile money built an entirely new financial system, without a single bank branch or postal address. It didn't wait for the old infrastructure to arrive. It went around it.
The same pattern is now forming around AI. The global conversation has largely revolved around text; chatbots, document tools, coding assistants; optimized for a modality that many people interact with daily but use with real friction. Not because they are less capable. Because the technology was built for someone else.
Voice changes the equation. No keyboard. No specific script or spelling required. No requirement that your dialect has been formally encoded anywhere on the internet. You just speak. Whether AI can truly listen, for the hundreds of millions currently left outside the system, is the question the industry is only beginning to ask.
"The next era of AI won't be typed in. It will be spoken."
Chapter Three
The gap between translation and true understanding
Try a thought experiment. Ask an AI translation tool to render a Yoruba proverb into English. A good one will give you something, the words, the grammar, a sentence that technically makes sense. What it almost certainly won't give you is the social weight the proverb carries, or the specific gravity that surrounds it when used in a complaint, not as an insult, but as a quiet plea to be treated with dignity. The machine translates the surface. The depth stays behind.
This is not a technical limitation waiting to be solved with more compute. It is a structural one. A language is not a code for transmitting thoughts. It is a compressed archive of a people's worldview; their humor, their grief, their way of cutting through a complex situation with a single phrase that would take three English paragraphs to approach.
Many languages spoken across this region are tonal, the pitch of a single syllable can reverse a word's meaning entirely. Most carry intricate systems of social register: your word choices depend on who you are talking to and in what context. And then there is code-switching; the fluid, natural movement between two or three languages mid-sentence that is entirely ordinary in urban Nairobi, Lagos, or Accra. That is not a quirk. That is the communication.
Building AI that navigates all of this is not simply a technical problem. It is a collaboration between people who understand engineering and people who carry the language; its rhythms, its silences, its unspoken agreements. The data scientist builds the model. The linguist teaches it what the data alone cannot tell it.
"The data scientist builds the model. Culture writes its right conscience."
Chapter Four
The most open territory in AI right now
Nobody knows exactly what the next decade of AI looks like for the majority of the world's languages. That's the honest starting point. The field is moving fast and unevenly, and what gets built depends on choices being made right now; about where resources flow, whose voices get included in training data, and whether the people making those decisions have ever seriously asked what AI needs to do for someone whose language has barely registered on the internet.
What does seem increasingly clear is that the crowded part of the AI market; English-language tools competing over similar users in similar contexts; is beginning to mature. The genuinely open frontier is somewhere else entirely. It is the farmer, the trader, the student, the first-generation entrepreneur, the nurse in a rural clinic; people who are economically active, digitally curious, and young, and for whom current AI tools have simply never worked.
The future, built thoughtfully, belongs to models that treat each language as a first-class input, not a translation of English intent, but a distinct worldview with its own logic and its own communities of speakers who can push back when the model misunderstands. It belongs to voice interfaces that understand the dialect spoken in a specific part of Northern Kenya. To AI that has learned, through data gathered with consent and compensated fairly, what it actually means to live and think and speak as someone whose world the internet has barely noticed.
Not to render the world in English for AI. But to build AI that already knows how to listen.
"The most powerful AI of the next decade will not be the largest model. It will be the one that the most people can actually talk to with trust that it understands."
Stalwart Tech Solutions builds the data pipelines and human-intelligence teams
that make AI work for the languages the world has yet to hear.