Milestones of Machine Translation: Part 3 — The future of MT

An interview with Team Lead Machine Learning and MT expert Martin Stamenov on the future of machine translation.

Milestones of Machine Translation: Part 3 — The future of MT
Lengoo Marketing Team
Oct 5, 2020

Hollywood films paint a gray picture of the future — metallic and threatening. In reality, technological advancements are usually helpful rather than dangerous. The future — with its seemingly endless technological progress — fascinates us. Also because we know so little about it. Today, we shed some light on the matter. After reflecting on the initial difficulties and current successes of machine translation, we now round off with a look to the future.

To get the best-possible insight into the state of research, we asked Team Lead and Machine Translation Expert Martin Stamenov about his views on the development of machine translation in the coming years (don’t worry, all those who aren’t well versed in the jargon can have a look at the glossary at the bottom. Glossary terms are written in italics.)

Martin, what are you and the Machine Translation Team at Lengoo currently working on?

Active learning is a big topic for us. We provide our clients with specialized, customer-specific Engines, based on their TMX, glossary, monolingual or synthetically generated data. As these data sources are constantly growing and ever evolving, the models should be doing the same and be updated to the latest changes state of the customers intelligence (content).

Why are you so fascinated by your work?

I personally really enjoy being on the edge of innovation, constantly playing around with the latest developments and actively contributing to shaping them.

But even more so, I enjoy using this mindset for bridging the gap between the research and industry. Often the industry has different needs than what the typical research metrics can measure. For example the most common and research on MT model architecture has only one input stream, which is for the source sentence. This does not suffice in the context of a professional translation project, where you typically a professional translator needs to consider a variety of different inputs during the translation of a segment, like glossary matches, partial Translation Memory (TM), etc.

What do you consider the greatest breakthrough in Machine Translation so far?

Many experts were skeptical when Neural Networks first arrived in MT. But when the Attention Mechanism was published and the first claims for human parity started coming up it was clear that NMT is here to stay. What I liked about it is that it was inspired by the way a human would read, interpret and translate a sentence, namely by focusing or paying “attention” to different portions of a sentence.

I hope to see more human and linguistic-inspired features integrated into the training and pre-training methods .

How do you assess the current state of research? Do you think the tech giants have an advantage over smaller companies?

In a nutshell: great. The whole NLP and also MT field is very active, both from academia and industry. Everyone is publishing their latest breakthroughs and results and everyone has realised that being open-source first is really beneficial to everyone and progress is made faster. There is no secret sauce in the model architecture in Machine Translation models.

How do you see the future of Machine Translation?

Depends if you are talking about research or industry.

As far as the research is concerned, I think that we are on the right track. I would like to see more linguistic features being embedded in the model architectures and for us to move from the sentence-level into document-level translation.

As far as the industry is concerned, in my opinion whoever has the better ecosystem holds a better chance of succeeding as a Machine Translation Provider.

An ecosystem for me means defining and applying a set of tools and processes allowing to get the most out of the Machine Translation and respectively the human-in-the-loop. In our case this ranges from being able to seamlessly integrate the machine translation into the professional workflow, e.g. via a MT-first CAT tool, successfully operationalizing ML models, meaning orchestrating of training, retraining (online training) and inference runtime, last but not least creating a networking effect between the models.


Accuracy Scores: how accurate is a translation? Theoretically, this accuracy score can be given as a percentage. From a practical perspective, however, this is one of the difficulties of Machine Translation, as Martin explains in his interview.

Attention-Based Neural Networks: these Neural Networks “know” which part of the sentence they should focus on first. For example, they do not translate every sentence from the first to the last word, but depending on the structure of the language, can also start at the middle or end.

Customer-specific Engines: “bespoke” machine translators that are specialized in certain companies. The engines are trained with historic data from the company and therefore can be adapted to different industries and the unique communication of the individual company.

Cluster: a collection of data with similar properties. In Machine Translation, semantically similar sentences are clustered.

Data Labeling: the manual “labeling” of data quantities. Image recognition offers a simple example here. Before a computer can, for example, distinguish between pictures of different animals, it requires data with labels. For example, humans must first manually determine which animal is shown on a certain image, before “teaching” these data to the computer. In a similar vein, with translations, it must be clear which language and specialist field they concern.

Data Selection: the selection of data to be used is crucial to the end quality of the product. High-quality and representative data are needed.

Human error: a sort of human error rate. In image recognition, human error recently amounted to 5.1%, whereas a Microsoft AI managed to reduce it to just 3.5%.

Language Models: language models revolve around the predetermination of the next word. Different probabilities are put together.

Property Sets: data has certain properties that can be grouped into larger sets.

About Martin Stamenov

Martin Stamenov studied Computer Science at the Karlsruhe Institute of Technology (KIT). He already started working on machine learning during his studies, and continued in the field directly after graduating. He works at Lengoo as a Team Lead Machine Learning driving our machine translation research and development.