By Bill Boyarsky
One of the most disturbing aspects of the National Security Agency surveillance scandal is the way government has reportedly worked with private companies such as Yahoo, Facebook and Google. Those companies have disputed allowing the government direct access to their servers, as was originally reported by The Washington Post and The Guardian. The technology firms, Google foremost among them, have even pressed the government to be more transparent. However it was with their help that the NSA has devised a system to sort the massive amounts of data swept up by its snooping.
The system, called Hadoop, was first disclosed by The Wall Street Journal and in depth last week by Salon. It is a way of storing, processing, classifying and analyzing billions of phone numbers, emails, texts, addresses, motor vehicle registrations, births, deaths, purchases and all of the other data we generate in our daily lives, speeding the information into the computers of such enterprises as Google, Facebook, Yahoo, as well as into the NSA database. The question to be answered is just what is being done with the data.
“Hadoop’s importance to how we live our lives today is hard to overstate,” Andrew Leonard wrote in Salon on Friday. “By making it economically feasible to extract meaning from the massive streams of data that increasingly define our online existence, Hadoop effectively enabled the surveillance state. And not just in the narrowest, Big Brother, government-is-watching-everyone-all-the-time sense of that term. Hadoop is equally critical to private sector corporate surveillance. Facebook, Twitter, Yahoo, Amazon, Netflix—just about every big player that gathers the trillions of data ‘events’ generated by our everyday online actions employs Hadoop as a part of their arsenal of Big Data-crunching tools. Hadoop is everywhere—as one programmer told me, ‘it’s taken over the world.’ ”
I was introduced to Big Data during the last presidential campaign by Sasha Issenberg’s articles in Slate and in his book “The Victory Lab.” Issenberg wrote how a new generation of political campaigners was analyzing all this information to target voters, direct specific messages at them and make sure they got to the polls on Election Day. The process helped President Obama win re-election.
Their data came from sources as simple as voter rolls and birth certificates, but it also drew on untold amounts of more sophisticated information compiled by firms that accumulate all kinds of particulars on individuals. One such company was Acxiom, which, Issenberg wrote, has gathered millions of details from Lands’ End and other retailers, financial institutions such as Charles Schwab, auto dealers and magazine publishers and distributors. Much of the data included phone numbers and addresses.
If it was an invasion of privacy, at first it didn’t matter. There was just too much information. The accumulators could not keep track of what they had. Political campaigns weren’t the only ones with the problem. Unmanageable data was pouring into Facebook, Google, Yahoo, other businesses—and the Central Intelligence Agency and the National Security Agency.
That’s where Hadoop came in.
I learned about its development in a series called “The History of Hadoop” by Derrick Harris on the technical business website GigaOM.
Harris reported that in 2002, Doug Cutting, an Internet search expert, and Mike Cafarella, a University of Washington graduate student, began work on an improved search engine. (Hadoop was named after Cutting’s son’s stuffed elephant.) At the time, Harris wrote, there were about a billion Web pages compared with what Wired magazine estimates as a trillion at present.
Yahoo worked with Cutting and Cafarella as did Google and others. It was open source development; anyone with an idea and ability could jump in. The NSA and the CIA were among those that did. As a result, Leonard wrote, “The spooks and the social media titans and the online commerce goliaths” have collaborated to track our behavior in “fantastically intimate ways. ...”
So our privacy was shattered long before Edward Snowden’s revelations about snooping by the NSA, for which he did work while employed by private contractor Booz Allen Hamilton.
Snowden was courageous in disclosing what he knew about the NSA and its PRISM project. But the outrage sparked by his leak has been centered on the NSA and the CIA. Nobody complains about the social media on which we volunteer every passing thought and record our life’s events, great and small. Nor do we refrain from buying goods from Amazon.
A blogger saw the danger in the collaboration of such powerful and pervasive commercial enterprises with the much more powerful government security organizations. This is how it was put it in a post on the Center for Digital Democracy’s website June 11:
“The dramatic global expansion of pervasive data collection by Google, Facebook, Yahoo and the digital marketing industry has created a commercially oriented surveillance system. But it’s no surprise that such immediate access to our lives—including rich details gathered by tracking our mobile device use (including location) and communications with friends (social media), as well as reams of first and third party data about our finances, health, personal and professional interests make it fertile territory for government collection.”
The huge data-gathering businesses are co-conspirators with the NSA and the CIA in the development and expansion of the surveillance state. They worked together to produce the Hadoop system that made their surveillance sweeps of the Internet possible. Now these businesses are trying to distance themselves from their government allies. I hope they don’t get away with it.
AP/Marcio Jose Sanchez
The Facebook “like” symbol is on display on a sign outside the company’s headquarters in Menlo Park, Calif.