In part one of this series, we broke down some of the differences between human translation and translation by a program such as Google Translate from a linguistic point of view. In this post, we’ll take a look at some of the more technical aspects of this comparison. What is it, exactly, about the nature of machine translation programs that makes them useful for travelers looking to quickly communicate short concepts to others in a foreign language, but far less desirable for the professional translation of documents?
When our translators mention to acquaintances what they do for a living, they are sometimes met with a lack of understanding as to why their job is necessary in this era of technology. Not infrequently, they hear a variation on the theme of, “I used a translation program on my phone when I went to Europe, and it helped a lot! Why shouldn’t I use it to translate my business emails, manuals, or patent applications?”
There is no question that, in a pinch, particularly for travelers, machine translation programs can be lifesavers (occasionally literally!). Translation programs that do not employ artificial intelligence (AI) have had the vocabulary, grammar, and syntax of a language built into their software, and can usually quite easily assemble simple sentences such as, “Where is the museum?”, “Please help; I am lost,” or, “How much does this _____ cost?” Translation programs that use AI have been fed thousands to millions of examples of translated texts, and the programs learn how to “speak” the language based on these examples and the corrections to their translations that are input by the users. Great strides are being made in these fields – some of them truly groundbreaking – but at the moment, these machine translators are still very much a work in progress when it comes to producing language that truly sounds natural. The following is a partial list of reasons why it’s still best to leave document translation to the human professionals.
- Machine translation programs often skip entire phrases or sentences, or insert gobbledygook into their output. There is no predicting when this will happen, and while we are certain that programmers are working on both of these problems constantly, no apparent solution has come to light as of the date of this post. Whereas gobbledygook is usually fairly easy to identify, an omission of a phrase or sentence is much more difficult to discover if you don’t know both languages involved in the translation. While human translators sometimes do miss a phrase or sentence, a good translation company employs editors, proofreaders, and quality assurance professionals who ensure that these errors are caught before the client ever sees the translation.
- The more specialized the subject matter is, the less likely it is that a machine translation program will get it right. As mentioned in the previous post, the AI in a machine translation program relies heavily on previous source materials for the vocabulary and usage it employs in new translations. Let’s say you need a translation of paperwork from a lawsuit regarding infringement on the patent of a brand-new invention in the field of electrical engineering. While the machine might do reasonably well with any legal boilerplate it has seen before, things will get noticeably rockier as it attempts to deal with the specialized language used to describe the invention. This can sometimes be a challenge for even an experienced human translator with years of electrical engineering experience; it requires the translator to do significant research and visualization, and in extreme cases, it is necessary to work with the client or consult with a specialist to make sure the translation is accurate. Asking a machine with no experience and no ability to research or discuss its work to take on a translation of this nature is a recipe for nonsense at best and disaster at worst.
- The rarer the language is, the more challenges a machine translation program will face when translating it. This is another issue that stems from machine translation programs’ reliance on either prior source materials (in the case of an AI-based program) or on the grammar, syntax, and usage rules input by their programmers (in the case of a program that does not use AI). While languages such as English, Spanish, and Japanese have huge volumes of online source materials and robust documentation of the rules governing the language, other languages have a smaller presence in one or both of those areas. To reiterate, a machine can’t translate what it’s never encountered before.
- Machine translation programs can’t read a PDF or image file, and optical character recognition (OCR) technology is currently far from perfect. None of the considerations listed above will even become an issue if your source file is not in a format the machine translation program can read. A good many of the source texts received by translation companies are scans of documents, and the client does not have access to the editable original: user manuals, medical records, school transcripts, and court rulings are just a few examples of commonly translated documents that fall into this category. A human translator can read the document in image or PDF form, and format the translation to look like the original. A machine needs the help of an OCR tool to be able to read the image file or PDF in the first place. In our experience, OCR programs still have some serious issues, such as frequently “recognizing” lines left by a fax machine or stray specks in a scan as letters and inserting them randomly into the text, or formatting its output in a manner that is confusing and difficult to change, and that sometimes renders part of the text completely invisible. Given some of the issues we’ve discussed in this post, you can imagine what happens when this output is then run through a machine translation program.
Ready to send your translation to a language services provider that uses real humans? Click here to contact us for a free quote.
In part three of this series, we’ll take a look at variables in the fee structure of a translation agency, and examine the concept of the post-edited machine translation.