Document Actions

You are here: Home Online Magazine experience & get … Artificial Intelligence Meets …

Artificial Intelligence Meets Old Cooking Recipes

A program for handwritten text recognition (HTR) deciphers old letters, postcards, and diaries

Freiburg, Nov 25, 2020

Just about everyone has some old letters from grandma or handwritten diaries from a great-aunt lying around at home somewhere. For some people living in Germany today, however, these could be difficult to read because their ancestors wrote in another language, like Russian, Serbian, Ottoman Turkish, or Arabic. This is where the project MultiHTR comes into play. HTR stands for handwritten text recognition. Professor of Slavic Studies Dr. Achim Rabus from the University of Freiburg originally began training MultiHTR, which is an application based on artificial intelligence (AI), to decipher manuscripts written in Old Church Slavonic. In a new project, he is using HTR to decipher pieces of writing like old holiday postcards and recipes that are submitted by the general public. Jürgen Reuß talked to him about his research goals.

Discovering handwritten treasures: Not everyone speaks the language used by their grandparents to write letters. Prof. Dr. Achim Rabus and his team want to help. Photo: Lena Lir/stock.adobe.com 

Prof. Dr. Rabus, you are a Slavic Studies expert currently researching such venerable texts as mediaeval manuscripts written in Old Church Slavonic. How did you get the idea to apply this knowledge and these skills to everyday writing as well?

Achim Rabus: What some people may regard as mundane bits of writing are actually very interesting sources to work with for linguists. This is because they often reveal how people actually spoke a language at a certain time. When I found out about the call for applications for a grant from the Ministry of Science in Baden-Württemberg that focuses explicitly on smaller departments and the use of artificial intelligence, it was clear to me that we fit the funding criteria extremely well. First of all, Slavic Studies is a comparatively small department. Second, we already have experience applying AI to language. Third, we identified an area that has been little researched to date where we could combine our expertise in our field with our AI skills to benefit the public in a meaningful way.

How do they benefit?

There are many people in Germany who, for whatever reason, have lost touch with their cultural heritage. This is often the case when linguistic traditions don’t get passed down. A good example is our grandparents’ and great-grandparents’ generations, who learned Kurrent or Sütterlin cursive script in school. Because of the different script, their letters and diaries cannot be read by people today, although they were written in German. The program we are using for recognizing old Slavonic handwriting, called Transkribus, can be trained for other languages and scripts as well, including old German cursive handwriting. So, if you have any letters, recipes, or similar handwritten texts at home that you want to understand, we encourage you to contact us on our social media channels.

Achim Rabus invites all who are interested to send in old handwritten texts. Photo: Thomas Kunz

This doesn’t really have much to do with Slavic Studies though, does it?

No, it doesn’t. We also want to make it clear that we were not the ones who trained Transkribus to recognize old German cursive writing. We are offering this service because every program gets better the more you train it. Also, because we are Slavic Studies researchers, we are of course especially interested in reaching the many Germans with roots in Russia or former Yugoslavia. Some may even speak a little of the language of their grandparents, but if they have roots in the Serbian Orthodox part of former Yugoslavia, for example, they won’t be able to read the recipes or diaries that their grandparents wrote because these were written in Cyrillic. You can find several examples of the kind of writing people have sent to us to decipher on our Instagram page.

There are also examples of Arabic handwriting.

That’s right. My colleague Prof. Dr. Johanna Pink from the Department of Southeast Asian Studies is also involved in the MultiHTR project, and for a good reason. Ottoman Turkish was traditionally written in Arabic until Atatürk decided to use the Latin alphabet instead. This means that if you grew up in Germany but your grandparents immigrated from Turkey, you won’t be able to read handwriting from the Ottoman era. That is why we want to develop smart Transkribus models that are able to decipher handwriting and convert it into the Latin alphabet. However, one somewhat larger problem when training AI recognition is that Arabic, for example, is written from right to left.

You are also collaborating with Prof. Dr. Veronika Lipphardt from Science and Technology Studies. Why is that?

AI does not discriminate. However, the pioneers of AI are programmers in major technology companies in Silicon Valley who have a certain perspective in mind – that of predominately young white males who tend to be nerds – while other people are underrepresented. My colleague Veronika Lipphardt has conducted comprehensive research on the discriminatory effects of AI. The results of her research are very important to us and our text recognition project because we want to avoid any distortions, aberrations, or discrimination from the get-go.

You are also using social media in this project? Why is that?

Our HTR AI gets better and better the more it is trained. We see our online offer as a kind of crowdsourcing. The more people who take advantage of the offer and help to correct the results, the more we and subsequent users will benefit from this.

 

Multilingual Handwritten Text Recognition (MultiHTR) (in German)

 

Filed under: