Machine translation: from Al-Kindi to the Cold War

Eight centuries later, Descartes proposed a universal language grounded in the similarities between different languages. Other proposals would follow, including some based on Esperanto. But the most concrete advances in the development of machine translation occurred during the Second World War, as secret code was being deciphered. These methods were then refined during the Cold War, in attempts to translate Russian into English as well as possible. Looking back, we could say this was the first, rudimentary version of rule-based machine translation. In other words, the first real machine translation was used for espionage.

Fast-forward to seventy years later, when developments in the fields of IT and computing have caused a veritable revolution. Today, the buzzwords are neural machine translation and deep learning, but there are other types of machine translation as well.

Types of machine translation

We have already briefly mentioned RBMT, or rule-based machine translation. Another type of machine translation is SMT, or statistical machine translation, and then there’s NMT, which stands for neural machine translation. The latter is the most recent type. The three types are based on completely different principles.

Rule-based machine translation: coding a source language

RBMT is made up of three ingredients: a dictionary of the source language, which links each source word to a word in the target language, and the linguistic rules for sentence structure in both the source and target language. The more detailed the information provided, the higher the quality of the output. After entering a sentence in the source language, the MT software will first analyse the sentence’s grammatical structure, then translate its individual words based on what the dictionary says, before finally rearranging the words to fit the grammatical structure of the target language. So it isn’t hard to guess why it still goes wrong so often. Most of the time, dictionaries don’t contain a single translation for each word. Machine translation doesn’t take context into account either when choosing one word over another. And it can make mistakes when analysing a sentence’s grammatical structure. Idiomatic language, such as figures of speech or slang, are usually also mistranslated by RBMT software.

Statistical machine translation: comparing corpora

SMT or statistical machine translation approaches things in a completely different way. In fact, it uses neither dictionaries nor grammatical rules. So what does it use? In fact SMT draws on corpora, both for the source and target language. A corpus is an extensive body of texts about a specific topic. SMT links these corpora together. Simply put, every sentence in language A is linked to a sentence in language B. By comparing corpora, the MT software learns how the two languages relate to each other. It then applies that knowledge to translate other texts. Unfortunately, you need gigantic amounts of text to create corpora, and some languages simply differ too much from one another. This means that, in practice, the method works relatively well for some language combinations, but it is almost entirely useless for others.

Neural machine translation: deep learning

Third time’s a charm? NMT or neural machine translation is the youngest and most successful member of the MT family. This new type of machine translation has quickly gained popularity since 2013. It differs from RMBT and SMT in that it draws on neural networks and what is known as deep learning. The best metaphor for these structures would be the human brain. With NMT, the MT software teaches itself to translate by comparing corpora. A bit like SMT, then? Yes, but NMT needs much less data to get the job done – because deep learning makes all the difference. This means NMT results are much more promising than anything SMT or RMBT can achieve. There are drawbacks, though: developing models for deep learning costs a fortune. And deep learning should still be taken with a grain of salt.

Hybrid machine translation: a combination

There’s also HMT or hybrid machine translation, which is simply a combination of different types of machine translation. It does deliver stronger results, though. Linking carefully developed translation memories to machine translation vastly improves the quality of the resulting translations.

Google NMT

There is something else worth noting: GNMT, in which the G stands for Google. In 2017, Google Translate took a quantum leap forward by creating its own neural translation networks for certain language combinations. This significantly improved translations. So today American tech giants rule supreme. But however great the progress machine translation has made, it should still be seen as a tool that allows for immense gains in productivity. Translating massive volumes now requires the help of machine translation. But the post-editor (a human translator) still plays a crucial role. Without post-editing, the quality of the translation will still be inferior, and the output is often downright hilarious. Just read any machine-translated manual, and you’ll see there’s still a long way to go…

But what about ‘real’ language?

Machine translation, even in its most advanced form, still pales in comparison to ‘elegant copy’. Well-written texts, literature and commercial copy are simply not suitable for machine translation. As soon as language truly becomes language – that is, more than mere structure and words – the quality of machine translation takes a nosedive. Because code simply cannot capture the richness of language. Nuances, regional speech, words with multiple meanings, context-sensitive terms, double entendre and feelings: everything that makes language human is hopelessly lost. Even some of machine translation’s pioneers (Weaver, Bar-Hillel) didn’t believe machine translation would ever come close to the translation humans can make. So far, they’ve been right: we’re still grappling with the same conundrums those pioneers faced all those years ago.

Conclusion

Machine translation is a wonderful productivity tool, just as long as we remember it’s no match for the quality of a human translator’s work. A skilled post-editor remains absolutely indispensable to turn machine-translated output into an acceptable translation. In 2019, even Google announced that Google Translate can’t compete with human translators. Machine translation is already widely used, but mostly as an aid to handle large volumes of text. Remember: it’s still the post-editor who assures the quality of the final result. Without post-editing, in fact, quality is simply non-existent.

High-quality translators give machine translations short shrift. ‘Translators who can be replaced by machines deserve to be replaced’, is one popular retort, as is ‘Machines translate words, people translate language’.