Speech Recognition for good, with Team Scheire on Canvas.

Imagine a technology that subtitles Flemish speakers in real-time, no matter their accent or dialect. Sounds complicated? That’s because it is, but not for the Chatlayer.ai team.

Chatlayer.ai was founded in Antwerp in early 2018 as a spin-off of Faktion and owns the speech technology used in this episode. The company recently became part of Sinch, the global leader in cloud communications for mobile customer engagement.

Last year, Team Scheire from the Canvas TV show approached us with a unique challenge: “We want to subtitle Flemish teachers during their class so that Lola, a non-native student who is still learning Flemish, can read the teacher’s words in real-time and better understand what’s being said.”

A challenge right up our alley! Our team has been working on a Flemish speech recognition engine after noticing how cloud providers that offer Dutch Speech Recognition, got poor results in Flemish – a Dutch variant spoken in Flanders, Belgium.

To help Lola understand Flemish teachers who use a variety of local Flemish dialects when teaching, we built an application that allowed her to read a transcription of what the teacher was saying in real-time. Katrien De Graeve and her colleagues at Microsoft made sure Lola can make notes on the transcriptions, listen back to the recorded audio, and even look-up translations or synonyms. It was great to see all this come together on TV.

You can find the episode here: https://www.vrt.be/vrtnu/a-z/team-scheire/2/team-scheire-s2a6/

Curious about the technology behind our solution for Lola? Keep reading to learn more about the specific challenges we faced and how we solved them using clever technology and great teamwork.

The difficulties of Speech Recognition

Speech recognition converts a spoken audio signal into text. Sounds simple, but it’s actually quite challenging, especially in this case. Every teacher has a different voice, comes from a different region in Flanders, and pronounces words in their own local dialect. Our speech recognition engine had to be robust against all these variations.

Then there is the issue of background noise. Chairs move around and students are chitchatting in the background, which can confuse our speech recognition engine. And since everything is being transcribed in real-time, there is little time to filter out the noise and analyze the teacher’s voice at the same time.

Finally, most speech recognition engines are designed to recognize general-purpose speech. But every class has a different subject with subject-specific terminology. For example, the names of dinosaurs in Geography, Watt’s steam engine in History class, and all the theories in Chemistry. Our speech recognition engine doesn’t know all these class-specific words, which means we had to teach it ourselves.

Problem 1: Understanding different voices and their dialects

We were already working on a Flemish speech recognition engine for chat- and voice bots which means we’ve collected a lot of Flemish spoken data from call center conversations, interviews and TV broadcasts. Since our data includes a lot of diverse samples, our speech recognition engine could pick up on different accents and dialects quite easily.

Problem 2: Filtering out background noise

Chairs moving around, students talking in the background – a classroom can be pretty noisy. To solve the issue of background noise, we had to take two steps.

First, we trained our speech recognition engine on a variety of audio sources. Especially phone conversations, as they can be quite noisy – office sounds, car noise, other people talking in the background. Our engine heard it all.

For the second part of the solution, we had to find a microphone that can suppress a lot of this background noise and is affordable for every teacher and school. This is where Katrien De Graeve from Microsoft came in. She tested a bunch of different microphones, from simple AirBuds to more advanced ones, finally choosing a wireless clip-on microphone as the best solution for this case. The problem of background noise is now solved as well.

Problem 3: Recognizing subject-specific words

The final challenge we had to solve was training our engine to recognize subject-specific words. You see, every teacher teaches a different class with subject-specific terminology which can confuse our engine. This is a problem we encountered before when building chat- and voice bots. Every bot is designed for a specific use case, which means that a bank bot should be able to understand banking terminology, but that’s not the case for a health – or travel bot.

We solved this issue by finetuning our speech recognition model, using additional vocabulary and sentences related to each specific subject. And because the teacher knows in advance which class they’ll be teaching, they can upload their preparations which we in turn can use to research the subject as much information as possible. For example, we extracted subject-related pages from Wikipedia and used all that information to finetune the speech recognition model so it can understand the words.

Customer-speech-recognition

Putting it all together

The final result was an application in which Lola, a non-native speaker, could read a transcription of what the teacher was saying in real-time.

All words were recognized correctly, even when there was background noise or other distractions happening. And since the transcriptions were also saved in the application, Lola could read them after class and use them to study as well.

Our work for Team Scheire is a great example of how clever technology can solve everyday problems. We enjoyed working on this challenge and are happy that we could help Lola learn and understand our quirky language a bit better.

Curious to build your own chat- or voice bot? Sign up for a free trial.

This article was contributed by Fréderic Godin and Tess Tettelin.