Those of us who can hear well may not realize that video captions are a blessing for people with hearing issues. A few caption tools, like the one on YouTube, even use speech recognition technology and machine learning algorithms to create captions for videos uploaded on the platform. However, the results are not always accurate. Even Google admits that mispronunciations, accents, dialects and background noise can reduce the efficacy of the captions.
Dimitri Kanevsky, a research scientist at Google who lost his hearing when he was young, uses a tool called communication access real-time translation, or CART. This online service has a captioner that remotely listens and transcribes all the spoken words in the room, and the transcription shows up on his laptop screen. But services like CART are subscription-based and expensive.
To make such technologies accessible to users like Dimitri in smartphones and apps, tech companies are now making inroads in live captions that will transcribe speech into text in real time.
The upcoming version of Android, on its part, will have an optional live caption feature that will transcribe audio output in any video played by the user in real time. It will not be limited to YouTube and will work on top of any social media app, podcasts, offline movies and even in live video chats.
The captions will be generated using on-device machine learning tools, so they will work even if the user is offline. Live caption on Android Q will work even when the audio on the phone is turned down and users will also be able to save the transcripts of the captions.
In addition to YouTube, live caption now also work for Google Slides. Research at Google has also launched an experimental app called Live Transcribe. Developed in collaboration with Gallaudet University—a leading US institute for those hard of hearing. When switched on, the app can transcribe any sound and speech and show it on the screen in real time. We found the results quite inaccurate and unresponsive, but then it is still in its early stages and there is room for improvement. To be sure, Microsoft already offers AI-powered live captions and subtitles in Skype, allowing users to actually read the conversation in an auto scroll while the person on the other end of the video call is talking.
Microsoft is also working on real time translation in captions in 20 languages, which will allow users to understand what the other person is saying by reading live subtitles in the language of their choosing. Live caption also works for PowerPoint presentations.
Offering live captions, be it on cloud as in case of Skype or on the device in case of Android Q, will require a lot of computational power. It is one reason why the live caption feature will be limited to high-end models. The accuracy of translations will also depend on users’ pronunciation and the efficiency of the machine learning algorithms. The results may be inconsistent in the beginning, but with time, live captions will become one of the most powerful tool for people with hearing issues.