Lip-Reading System from Oxford Outperforms Professional Lip-Readers

The article on BBC News titled Towards a Lip-Reading Computer explores the invention of Oxford scientists touted for its superior-to-human lip-reading capabilities. The “Watch, Attend and Spell” system has been trained to lip read using BBC news footage and in partnership with Google’s DeepMind AI group. It claims a 50% success rate for correctly reading lips of news anchors, compared to just 12% success for professional lip-readers. Doctoral student Joon Son Chung explains,

What the system does…is to learn things that come together, in this case the mouth shapes and the characters and what the likely upcoming characters are.” After examining 118,000 sentences in the clips, the system now has 17,500 words stored in its vocabulary.

Because of its news-specific training, the system does much better with common phrases like “Prime Minister.” It will need a great deal of exposure to the other channels on the TV before it can claim real fluency. That said, the scientists at Oxford, and the charity group Action on Hearing Loss, are optimistic about the future. The potential for real-world applications includes better subtitles, better ability to instruct smartphones in loud environments, and even improvements to other speech recognition areas. The article does point out that no one thinks professional lip-readers should be concerned, in spite of being outpaced and outperformed by the technology. At least for now.

Chelsea Kerwin, April 19, 2017