Google, Amazon, and Apple say their AI-powered virtual assistants make it easier to get things done on smartphones or at home. Last month, a couple in the Waasmunster area of Belgium got an unexpected lesson in how these supposedly automated helpers really work.
Tim Verheyden, a journalist with Belgian public broadcaster VRT, contacted the couple bearing a mysterious audio file. To their surprise, they clearly heard the voices of their son and baby grandchild—as captured by Google’s virtual assistant on a smartphone.
Verheyden says he gained access to the file and more than 1,000 others from a Google contractor who is part of a worldwide workforce paid to review some audio captured by the assistant from devices including smart speakers, phones, and security cameras. One recording contained the couple’s address and other information suggesting they are grandparents.
Most recordings reviewed by VRT, including the one referencing the Waasmunster couple, were intended; users asked for weather information or pornographic videos, for example. WIRED reviewed transcripts of the files shared by VRT, which published a report on its findings Wednesday. In roughly 150 of the recordings, the broadcaster says the assistant appears to have activated incorrectly after mishearing its wake word.
Some of those captured fragments of phone calls and private conversations. They include announcements that someone needed the bathroom and what appeared to be discussions on personal topics, including a child’s growth rate, how a wound was healing, and someone’s love life.
Google says it transcribes a fraction of audio from the assistant to improve its automated voice-processing technology. Yet the sensitive data in the recordings and instances of Google’s algorithms listening in unbidden make some people—including the worker who shared audio with VRT and some privacy experts—uncomfortable. Privacy scholars say Google’s practices may breach the European Union privacy rules known as GDPR introduced last year, which provide special protections for sensitive data such as medical information and require transparency about how personal data is collected and processed.
VRT began talking with the Google contractor in the wake of a report by Bloomberg that described how audio from Amazon’s Alexa—including unintended recordings—is transcribed by company staff and contractors in locations including Boston, Costa Rica, and India. The Google contractor said that he transcribed around 1,000 clips per week in Dutch and Flemish, and that he was concerned by the sensitivity of some of the recordings. He showed VRT how he logged into a private version of a Google app called Crowdsource to access recordings assigned to him.
In one case, the contractor said, he transcribed a recording in which a woman sounded like she was in distress. “I felt that physical violence was involved,” he said in the English subtitles on VRT’s video report. “It becomes real people you are listening to, not just voices.” The contractor goes on to say that Google had not provided clear guidelines on what, if anything, workers should do in such cases.
In a statement, a Google spokesperson said the company has launched an investigation because the contractor breached data security policies. The statement said Google uses “language experts around the world” to transcribe audio from the company’s assistant, but that they review only around 0.2 percent of all recordings, which are not associated with user accounts.
Google’s reviewers may not see account data, but they still get to hear very private information, for example related to health. Jef Ausloos, a researcher at the Centre for IT & IP Law at the University of Leuven, in Belgium, told VRT that means Google’s system may not comply with GDPR, which requires explicit consent to collect health data.
Michael Veale, a technology policy researcher at the Alan Turing Institute in London, says those disclosures don’t appear to meet GDPR requirements even for data not considered sensitive. The group of national data protection regulators in charge of applying GDPR has said companies must be transparent about data they collect and how it is processed. “You have to be very specific on what you’re implementing and how,” Veale says. “I think Google hasn’t done that because it would look creepy.”
The Google spokesperson said the company will review how it could clarify to users how data is used to improve the company’s speech technology.
Veale has filed a complaint about Apple’s Siri with the Irish data regulator, arguing that the service breaches GDPR because users cannot access recordings made by Siri. He says Apple has responded that its systems handle the data carefully enough that the audio files of his own voice don’t count as personal data. Google and Amazon allow users to both review and delete their recordings; Amazon now allows users to call out, “Alexa delete everything I said today,” to purge your history.
Amazon’s privacy policies don’t describe how reviewers handle some Alexa audio. Like Google’s, its privacy pages say Alexa does not record all conversations, but don’t explain that it may inadvertently eavesdrop. Apple’s documents don’t describe reviewing processes either, although a security white paper says some Siri audio is retained for “ongoing improvement and quality assurance.” Amazon and Apple declined to comment.
Corrected, 7-10-19, 7pm ET: The Google contractor who spoke with Belgian TV said he reviewed 1,000 audio clips a week. An earlier version of this article said he reviewed 1,000 clips a month.