NSA conducts real-time analysis of spoken communications

Zaida Green
8 May 2015

Computer programs used by the National Security Agency (NSA) automatically analyze intercepted voice communications in real time, allowing NSA analysts to process what agency officials call an ever-increasing “tsunami” of data, according to documents released by whistleblower Edward Snowden and published this week by The Intercept.

One of the documents, an internal memo from 2006 entitled “For Media Mining, the Future Is Now!”, boasts that the use of human language technology (HLT) would enable a single NSA analyst “to sort through millions of [voice] cuts per day,” allowing the amount of communications collected to be increased by “orders of magnitude.”

Communications are transcribed, translated (if desired), analyzed, and tagged as they are intercepted, allowing NSA analysts to use keywords and other search terms to sift through the enormous amounts of data vacuumed by the agency’s mass surveillance programs. The technology can also identify dialect, gender, and individual speakers. The real-time processing allows analysts to receive alerts when incoming data fits search criteria.

Standard tasks of communications surveillance, such as the generation of frequent calling lists, historical trends and geospatial maps, have been automated by the integration of HLT systems with other surveillance systems. This enables the construction of complex queries that include phone numbers, locations and time periods.

One “important enhancement under development” in 2006 was the automatic flagging of “interesting” data based on analysts’ past behavior. Citing the ability of popular online retailers to track and predict buyer preferences, one memo envisions a future where, each day, “the best five intercepts… [sit] at the top of your queue waiting for you.”

The documents released by Snowden reveal extensive usage of HLT systems in Iraq, Afghanistan, Mexico, and other parts of Latin America—regions devastated by US imperialism. In 2005, the NSA deployed the Real Time Regional Gateway (RTRG) program in Iraq to intercept, tag and store “every Iraqi text message, phone call, and e-mail.”

The RTRG program became the model for PRISM, the NSA’s primary data mining program. Documents leaked by Edward Snowden in 2013 revealed that all the major Internet companies in the United States, including Yahoo, Google, Facebook, Microsoft, and Apple, were active collaborators in the PRISM program.

Jennifer Granick, the civil liberties director at the Stanford Center for Internet and Society, speaking to The Intercept, warned that domestic communications could just as easily be analyzed and stored. “It may not be what they are doing right now, but they’ll be able to do it. We don’t have any idea how many innocent people are being affected, or how many of those innocent people are also Americans.”

NSA whistleblower Thomas Drake told The Intercept that there was a huge push after the terrorist attacks of September 11, 2001 to efficiently process the massive amounts of voice communications being collected. “The breakthrough was being able to [convert speech to text] on a vast scale.” Though the transcriptions were not perfectly accurate, “I can still get a lot more information. It’s far more accessible. I can search against it.”

The first generation of technology capable of keyword searching of speech, or what one document likened to “Google for Voice”, was code-named RHINEHART and deployed in 2004, the first full year of the US invasion of Iraq. Another system in use during 2004 was EViTAP, a TV news-monitoring tool that automatically translated and transcribed live news sources in English and Arabic.

Over the next decade, the NSA’s HLT Program Management Office worked to produce faster, more sophisticated systems.

By 2006, RHINEHART was operating “across a wide variety of missions and languages” and was used throughout the NSA and Central Security Service. The NSA initiated its RT-10 that year, which had the express goal of “reducing the time between collection and the generation of actionable intelligence by an order of magnitude in each spin of the project.” The VoiceRT system, RHINEHART’s successor, was released shortly thereafter.

By 2008, the EViTAP system had expanded to cover four more languages: Russian, Spanish, Mandarin Chinese and Persian. EViTAP has since expanded to cover eight more languages, including Hindi, Urdu and Bahasa Indonesian. A version of the system has been made available commercially .

In 2012, VoiceRT was decommissioned and replaced by a system apparently code-named SPIRITFIRE, which provided “a more robust voice processing capability based on speech-to-text keyword search and paired dialogue transcription.”

None of the public reports released by the Privacy and Civil Liberties Oversight Board, appointed by President Obama, contained any reference to speech-to-text technology. The toothless panel, ostensibly “committed to the protection of civil liberties and privacy,” upholds the fraudulent claim that the mass surveillance being carried out by the US government is done in the interests of protecting the American public from terrorism.

As the United States government’s relentless and brutal campaign of austerity at home and war abroad makes clear, the ultimate target of the NSA’s spying programs is the working class at home and throughout the world.

NSA tapping vast majority of cell phone networks worldwide
5 December 2014

Read more