Lithuania needs skilled personnel in Language Technologies

According to the Strategic Research Agenda (SRA) for Multilingual Europe 2020 published this year by the Multilingual Europe Technology Alliance,  the lack of qualified professionals in Language Technologies is one of the major challenges for many SMES and research centres in this field, not only in Lithuania but in the whole European Union.

Students from the Vytautas Magnus University are not aware of how Language Technologies are part of their daily life. Picture by Alejandro Izquierdo

Students from the Vytautas Magnus University are not aware of how Language Technologies are part of their daily life. Picture by Alejandro Izquierdo

By: Alejandro Izquierdo

META project.

META is short for The Multilingual Europe Technology Alliance.

Definition: A network that brings together researchers, private companies, public institutions, language professionals and language technologies users with the aim to join international effort towards one single European digital market and information space.

Objective: Create a concerted, substantial, continent wide effort in Language Technology research and engineering for realizing applications that enable automatic translation, multilingual information and knowledge management and content production.

Actions:
2011: Are European languages in Danger?

2012: White papers: At Least 21 European Languages in Danger of Digital Extinction

2013: Strategic Research Agenda 2020: A Europe without language barriers

Vilnius, Lithuania: Ernestas Petkericius and Jonas Ignatavicius are students at the Vitautas Magnun University in the Lithuanian city of Kaunas. As other students, they are sitting in the hall of the second floor of the Faculty of Humanities while they use their computers and smartphones. Even though they are hanging out just a few meters from the Centre of Computational Linguistics, they have never heard speak about language technologies. Yet, they use them every day when they find information with a search engine; when they check spelling and grammar in the computer, when they follow the spoken directions of a navigation system or when they translate web pages via an online service.

-The definition of language technologies is not clear for me, says Jonas Ignatavicius who is currently studying political sciences.

He likes to write his SMS with the Lithuanian alphabet, uses Lithuanian language for his GPS and also for writing word documents. He says that he would like to keep using Lithuanian language for these issues.

– I have a tablet which does not have Lithuanian language and it just drives me crazy.

Ernestas represents the other side of the coin. He is doing a bachelor’s degree in East-Asian relations and affirms without hesitation that he doesn’t care about Lithuanian language being preserve from digital extinction.

– I think about my future and I see that there is not much to use Lithuanian language actually, if I go abroad I will have to speak English or another language like Japanese, which I am studying actually. His computer and mobile phone interfaces are in English.

He has been working for more than 20 years in the Lithuanian Corpus

Doctor Andriu Utka in the research center of Computational Linguistics at Vytautas Magnus University , Kaunas,has been working for more than 20 years in the Lithuanian Corpus. Picture by Alejandro Izquierdo

Doctor Andrius Utka is the head of the Centre of Computation Linguistics at the Vytautas Magnus University. In this research center there are six people working permanently and around 30 people partially involved in different projects such as the corpus of the Lithuanian language. Andrius Utka has been collaborating with META project since the beginning and even though he thinks other European countries such as Germany or Spain already have enough qualified professionals in the languages technology field and don’t need more of them, his centre in Lithuania faces a lack of new students.

-We have a lot of opportunities to get public funding but not enough competitive people to start more projects. Here in the centre we are already working the full load. What we need is more students and professionals, Andrius Utka states.

According to him, there are three main causes to explain why it is so difficult to introduce new students into the language technologies.The first one is a simple lack of awareness.

– As you can see already in the corridors of this faculty, young people are not aware about how important Language technologies are for their daily life.

The second one is that students just do not feel very attracted by this kind of studies.

– There is this sometimes true stereotype that researchers are poor people and linguists are even poorer. They university launched a master programme of digital linguistics in 2006 which is accredited until 2015 – but it is not very popular.

The third problem is that language technology is a very interdisciplinary field. There is a part of information science and a part of linguistics. Usually these two fields are somehow opposed.

– The computer scientists are afraid of linguistic matters because they are more into programming software and such, while the humanitarian students are often afraid of computers, Andrius Utka explains.

– Fortunately, there are some crazy people among them that want to do both and if we find them, we try to hook them into our projects.

Speaking as a Proto-Indo-European

Linguists use Lithuanian to reconstruct the sounds of  Proto-Indo-European language by comparing it with other old Indo-European languages such as Latin or Sanskrit

Sound systems:
• The sound *m from mother: Greek mētēr, Latin māter, Sanskrit mātā, Old Irish māthir, Old High German muoter, Lithuanian motė.

• Sound *s from the verb to sit: Latin sedēre, Sanskrit sadah, Gothic sitan, Lithuanian sėdėti. (The English word sit is related but notice that we have a final –t where we might expect a –d on the basis of other Indo-European languages.

• Sound *d from the word God or divine: Latin divus, Sanskrit dēvá’-h, Old Prussian deiw(a)s, Lithuanian Diẽvas.

A very old language

The Institute of the Lithuanian Language in Vilnius tries to defend and preserve the linguistic heritage through different projects such as the Museum of the Lithuanian Language which tries to teach kids about the importance of their language, or the META project, which combines this heritage with the new technologies.

Lithuanian is a very old language and Lithuanians are proud of it. As the director of the Institute Jolanta Zabarskaité says in the brochure The Lithuanian Language Heritage: “we proudly quote the French linguist Antoine Meillet, who said that anyone who wanted to hear old Indo-European should go and listen to a Lithuanian farmer”.

Linguists use Lithuanian to reconstruct Prot-Indo-European language. Source: The Lithuanian Language - Past and Present

The Proto-Indo-European family. Source: The Lithuanian Language – Past and Present

– This idea of Lithuanian being an old language is a bit of mythology, says William R. Schmalstieg from the Pennsylvania State University in an academic article called The Lithuanian Language – Past and Present. However, he states that even if the notion of age with regard to a language is hard to understand, it can be demonstrated in detail that in many ways, and in comparison with other Indo-European languages, the sound system of Lithuanian has not changed very much from that of the Proto-Indo-European-language.

There are sounds from contemporary Indo-Lithuanian and Latvian that are more similar to ancient Greek, Latin and Sanskrit than the sounds of the languages such as French or Spanish which come directly from Latin but have changed a lot in the course of time (see factbox above).

The director of the institute was a bit skeptical about the project because she found it too broad, now she is happy with the results and says that that involving all the target groups has been a genius idea. On the other hand she agrees that talking about “digital extinction” it is a bit metaphoric. Nevertheless she warns about the risks that minor languages are facing.

– For languages such as Lithuanian, Latvian, Maltese and Icelandic, the market possibility is very small and language technologies are quite expensive so the market cannot take the all responsibility for it. That is why they are at the end in the four different areas analysed in the White Papers (see factbox below).

For this reason one of the aims of the project is to attract European politicians and “try to kick off their proud about the language a little bit”, Jolanta Zabarskaité explains. She also give an example of how thanks to projects such as META, politicians are more aware about these risk and problems.

 

EU presidency

In the second half of 2013 Lithuania will be the first Baltic state to hold the EU presidency. For the European language day on the 26th of September, Lithuanian Politicians are preparing a special event of Lithuanian tongue to invite politicians from the whole Europe, from the parliament committees of information society and cultural development, to discuss the programme of languages technologies in the whole Europe.

For Jolanta Zabarskaité, promoting language technologies is not only a matter of cultural heritage but also of “democracy”. When asked if it would not be better to put the resources of META into a project to teach English, she answered that – to push everyone to learn one language, whatever it is, it entails some kind of force, and this is against democracy.

-Young people have the opportunities to learn other languages but maybe in small villages they do not have these opportunities and yet they have the right to express themselves in their own language, yet she agrees that learning a second and even a third language is somehow essential nowadays.

From the Institute of the Lithuanian language they are also trying to promote language technologies among young people and teachers with different campaigns. Languages technologies can be used for learn native and foreign languages, Jolanta Zabarskaité explains, but they are also essential “to understand how rich and important multilingualism is for democracy and a multicultural Europe”.

The META network made a Cross-Language comparison of 30 European languages. They covered four different areas and conclude that 21 European languages are in danger of digital extinction. Source: META Strategic Research Agenda 2020

 

Not so bad

Lithuanian language, together with Latvian, Icelandic and Maltese are the only four European Languages from the 30 languages analysed in the white papers 2012 that have weak or no support in the four areas of the cross language comparison.

Mr. Andrius Utka also has his opinion about the Cross-Language Comparison table.

– I do not think Lithuanian language should be in the last column with -weak or no support- at all in the four areas when actually, centres like ours have been working for more 20 years now.

– There are research centers and private companies like TILDE that are also doing their research so cannot say that there is no support at all.

According to Andrius Utka, the reason for this pessimism is to make a big issue out of it and show politicians that there is a problem so they can open doors for funding.

– When you are in the last place, everyone becomes very worried. If you do a Google search you will see that Lithuanian media made a big issue about it saying that ‘We are in the last place in all four categories but actually they are veiling the truth`. The support is fragmented because there are areas better developed than other but there is some kind of development in every area, he explains.