Last weekend, bloggers discovered a file that AOL had posted on a public research website, containing 20 million search logs from 657,427 subscribers collected between March and May. Each user was assigned a number, presumably to protect anonymity, but all searches were grouped together according to the user who made them; so by scrutinizing a person's search history, it is possible in some cases to deduce that person's identity.
For instance, if User #458372 searched for the names of John Doe's friends and coworkers along with his address and e-mail, presumably you can deduce that User #458372 is in fact John Doe — and if those search results also include phrases like "how to grow weed in my garage" or "how to pass a drug test," then suddenly you've learned some very private facts about John Doe's life.
Several bloggers have already discovered specific examples of this in the data, and a few are even claiming to have identified specific people whose search data is included. The reason I'm using a hypothetical example rather than simply citing those individuals is that I don't want to exacerbate these people's victimization — and that point is why I'm writing this column.
The same bloggers who discovered this data and objected that it constituted a gross violation of privacy are now passing the data around. AOL has removed the file from its website, but the bloggers had already downloaded it and now they've set up mirrors to share it with anyone who goes looking. These same people who are excorating AOL for hurting its users are simultaneously combing through this data like voyeurs, playing detective to guess the identities of unsuspecting people and announcing their results on the Web, and generally making a bad situation worse.
By finding AOL's mistake, the bloggers did perform a service; but the harm they've done since grossly outweighs that good. Instead of bringing their discovery directly to AOL — or to a major newspaper, which presumably would have confirmed the story with AOL and allowed the company to remove the data before revealing its existence to the world — they chose to declare their findings on community weblogs and invite everyone to download the file, because the only thing more important to these bloggers than condemning AOL was being properly credited by driving traffic to their websites. AOL screwed up and should fire the technician who published this information, but its carelessness is far outstripped by these bloggers' deliberate indifference — and that's the greater tragedy here.