When Words Were Not Enough

This is part of a series on conversational intelligence: where the intelligence is today, and how to use it well in business.

Once we had embedded conferencing working, a different kind of problem surfaced almost immediately. It was clear that connecting people was not the same as helping them understand one another across language. That was the next problem we began trying to solve.

This was very early. We were building the technology, not operating at a scale where active multilingual calls were happening routinely. We tested. We pushed the systems. And even in those early efforts, it was clear the technology could remove distance without removing the gap.

That was the beginning of the work that eventually led to real-time translation within the communication system. And it was also the beginning of understanding something most conversations about translation still underestimate.

Translation is not a word problem

When most people think about translation, they think about swapping one set of words for another. English goes in, Spanish comes out. Dictionary on one side, dictionary on the other. A fast enough machine makes the swap in real time.

That model works for signage. It does not work for conversation.

A conversation is not a sequence of words. It is a sequence of moments in which two people are trying to build a shared understanding. The words are one layer. Timing is another. Tone is another. Register is another. The decision a speaker makes in the half-second before they answer a question carries as much information as the answer itself. Translation that only handles the words is a translation that delivers the transcript of a conversation without delivering the conversation.

You notice this immediately when you use early translation tools in a real exchange. The words are correct. The meaning is off. The speaker said something warm, and the system delivered something neutral. The speaker said something carefully, and the system delivered something confidently. The speaker paused in a way that meant something, and the system flattened the pause into nothing.

Every one of those losses is small. Put them together across a fifteen-minute conversation, and the person on the other end is talking to a stranger.

What we were trying to solve

The work at RealComm Global was not about building a better translator in the standard sense. Others were already advancing word accuracy. What we were trying to solve was different. The question was whether a system could carry more than words.

That meant preserving the signals that shape meaning in the first place. Tone. Pacing. Dialect. Emotional register. Cultural context. The difference between a question and a challenge is that they happen to use the same words.

Even early on, it was clear that translation alone was only part of the problem.

A translated sentence can be correct and still lose the speaker.

That was where sentiment entered the work, not as a marketing category, but as a practical problem. If frustration, hesitation, urgency, or restraint do not survive the crossing, part of the communication has been lost.

The aim was not a perfect translation. It was to preserve as much of the meaning, and as many of the signals shaping that meaning, as possible through the crossing.

In that sense, the problem was closer to what a skilled human interpreter does. Not only converting language, but conveying the person speaking.

Some of that was only partially possible with the tools available at the time. We worked with what the technology could hold and quickly learned what it dropped.

And what it dropped mattered.

A translation can be technically correct and still flatten emotional weight. It can preserve facts while losing hesitation. It can carry vocabulary while losing cultural meaning.

That is where the business risk sits. Not in mistranslated words alone, but in what gets stripped away when communication is reduced to words.

The words may arrive.

The conversation may not.

What got lost when translation was only about words

In the earlier stages of the work, what was often lost was a limitation of the technology itself. Emotional weight did not carry. Hesitation disappeared. Tone flattened. Context thinned.

A translation could be technically correct and still lose the speaker.

That was not usually a dramatic failure. It was cumulative. Meaning eroded a little at a time.

The words arrived.

The conversation often did not.

What this means for business today

Real-time translation in 2026 is significantly beyond where it was when we were working on it in the years leading up to the patent. More languages. Lower latency. Better acoustic models. Better on-device performance.

But the biggest shift is not just technical improvement. It is that systems can preserve more of the communication itself.

Elements of speech, such as tone, pacing, emphasis, and prosody, simply how something is said, not just what is said, can now be modeled in ways that were difficult or impossible then. Voice can be synthesized across languages while carrying elements of cadence and emotional register that earlier systems could not support.

A system can hear signals in speech. Emotion mapping is the process of making sense of those signals.

That distinction matters.

For many businesses, the bigger risk now is not that the technology cannot do enough. It is that businesses adopt these tools without well-defined processes for when they should be used, when a human should take over, and what should never be delegated to automation in the first place.

That is where failure often enters now.

Not because the systems cannot carry meaning.

Because the surrounding process does not.

A translation system can be sophisticated and still produce poor outcomes if the workflow around it is weak.

So, the question is no longer simply whether a tool can translate.

It is whether the business has designed a process worthy of the system.

What a system preserves matters more than what it processes.

In translation, that now includes not only words, but signals carried in tone, prosody, and increasingly the rendered voice itself.

That is what made the work worth doing in 2019.

It is what makes it worth doing well now.

About Mary Lee Weir

Mary Lee Weir has been building websites for 27 years and digital products in 7 countries. She holds U.S. Patent 11,587,561 B2 for a communication system and method of extracting emotion data during translations, and continues research and development in conversational intelligence. She runs Vero Web Consulting in Vero Beach, Florida, and founded Belize Web and Information Systems at home in Belize to serve Belizean businesses. She writes about AI, search, and the practical realities of building for the web at maryleeweir.com.

If any of this is useful

Book a 60-minute strategy call ($250) to work through how any of this applies to your specific business. Or start with a free 15-minute intro to see whether a longer conversation makes sense.