We are living in a world with massive amounts of text and audiovisual content streams. Human language technology (HLT) project SELMA (Stream Learning for Multilingual Knowledge Transfer) will help media monitors and journalists to monitor and make sense of data and content  – and also enable them to enrich audiovisual (AV) output through transcription, translation, voice-over and subtitling, thus making it more accessible.

The SELMA consortium aims to build a multilingual open-source platform and develops new methods for training unsupervised deep learning language models

The platform will process superlarge volumes of data and will feature a (self) learning AI system that is able to share information about data streams – and keep the inherent value of each language through a novel approach. The idea is to create a crosslingual common space. This means: the system will always collect and analyze data in the original language and subsequently translate it into another language upon request.

showing heads of 17 participants; 3 women, 14 men

Screenshot of the virtual kick-off meeting

Five European institutions have teamed up to establish the language platform: The Laboratoire Informatique d’Avignon (LIA) at Avignon University, the Institute of Mathematics and Computer Science (IMCS) at the University of Latvia, Portuguese software company Priberam, the Fraunhofer Institute for Intelligent Analysis and Information Systems, and the international broadcaster Deutsche Welle – who will also lead the consortium.

SELMA will run for three years and aims to produce an open-source platform by the end of 2023.

To keep up with news, research results and prototypes, make sure to stay connected with SELMA on Twitter. An official project website is coming up soon under selma-project.eu.

#AI #HLT #Accessibility #OpenSource #DeepLearning


(Featured Photo by Josh Sorenson on Unsplash)