The Pentagon’s blue-sky researchers are funding a project that uses crowdsourcing to improve how machines analyze our speech. Even more radical: Darpa wants to make systems so accurate, you’ll be able to easily record, transcribe and recall all the conversations you ever have.
Analyzing speech and improving speech-to-text machines has been a hobby horse for Darpa in recent years. But this takes it a step further, in exploring the ways crowdsourcing can make it possible for our speech to be recorded and stored forever. But it’s not just about better recordings of what you say. It’ll lead to more recorded conversations, quickly transcribed and then stored in perpetuity — like a Twitter feed or e-mail archive for everyday speech. Imagine living in a world where every errant utterance you make is preserved forever.
University of Texas computer scientist Matt Lease has studied crowdsourcing for years, including for an earlier Darpa project called Effective Affordable Reusable Speech-to-text, or EARS, which sought to boost the accuracy of automated transcription machines. His work has also attracted enough attention for Darpa to award him a $300,000 award over two years to study the new project, called “Blending Crowdsourcing with Automation for Fast, Cheap, and Accurate Analysis of Spontaneous Speech.” The project envisions a world that is both radically transparent and a little freaky.
The idea is that business meetings or even conversations with your friends and family could be stored in archives and easily searched. The stored recordings could be held in servers, owned either by individuals or their employers. Lease is still playing with the idea — one with huge implications for how we interact.
“In their call, what [Darpa] really talked about were different areas of science where they would like to see advancements in certain problems that they see,” Lease told Danger Room at his Austin office. “So I responded talking about what I saw as this very big both need and opportunity to really make conversational speech more accessible, more part of our permanent record instead of being so ephemeral, and really trying to imagine what this world would look like if we really could capture all these conversations and make use of them effectively going forward.”
How? The answer, Lease says, is in widespread use of recording technologies like smartphones, cameras and audio recorders — a kind of “democratizing force of everyday people recording and sharing their daily lives and experiences through their conversations.” But the trick to making the concept functional and searchable, says Lease, is blending automated voice analysis machines with large numbers of human analysts through crowdsourcing. That could be through involving people “strategically,” to clean up transcripts where machines made a mistake. Darpa’s older EARS project relied entirely on automation, which has its drawbacks.
“Like other AI, it can only go so far, which is based on what the state-of-the-art methodology can do,” Lease says. “So what was exciting to me is thinking about going back to some of that work and now taking advantage of crowdsourcing and applying that into the mix.”