Earlier this month Web company Google demonstrated how it could mine its own search tracking database to track the spread of influenza across the US. By analysing certain key search terms such as 'flu symptoms', or 'flu medicine' made within a certain geography, Google is able to track the incidence and spread of flu within a region. Although a search for flu-related symptoms doesn't necessarily mean that the searcher has influenza, there is a remarkable correlation to real illness data.
Google compared its historical search data over five years with actual data on flu incidence collected by the US Centers for Disease Control and Prevention (CDC) via a network of doctors and other health professionals and the search graphs matched CDC's almost perfectly. The advantage of the Google approach is that it can give health authorities early warning of a possible flu outbreak so that authorities can take appropriate action. The CDC method takes much longer to collate the data, by which time it could be already too late.
While this system can help track normal incidence of winter flu, there is some doubt whether it can help much with epidemic outbreaks such as the Asian Avian flu pandemic. In these cases the media furore surrounding the outbreak will probably have everyone turning into a web hypochondriac.
But this type of data mining raises some pretty serious privacy issues, says analyst Ovum. Most people would agree that fighting influenza pandemics is something in the public interest, the concern Ovum raises is how far this could go, particularly if the data ends up being attributable to an individual? Does looking up search terms really serve as evidence that an individual has influenza. Would they perhaps be denied the right to travel? Or even more ominously, the same technology could be used to identify other public health issues – would looking for AIDS symptoms, for example, mark an individual a possible AIDS carrier in their digital footprint.
Google, of course, is aware of these privacy issues, and says, "Google Flu Trends can never be used to identify individual users because we rely on anonymized, aggregated counts of how often certain search queries occur each week." But this could easily change if there was a flu pandemic, which is one of the greatest threats facing most countries today. If there was an outbreak then authorities would do everything in their power to stop the spread of the disease. This could also include Google search terms as already happens in the fight against terror. Once that genie is out of the bottle, it will be impossible to put it back.

Yes, Google's flu tracking poses serious privacy issues.
What if the government asked Google to hand over 5 years of tracking data that showed anytime someone searched for gun sales? Google has that data too. Maybe they should share it for the public good; that was the reason given for sharing flu tracking data.
But why blindly accept Google's assertions that the data was truly de-identified?
Recently, the NIH had to shut down access to a genetic data base that they thought contained safely de-identified records. They were wrong. See: http://www.contracostatimes.com/nationandworld/ci_10331125?nclick_check=1
Netflix released a data base of 500K people's ratings of movies that computer scientists proved was re-identifiable, See http://arxivblog.com/?p=142 Netflix is unrepentant.
And remember the supposedly de-identified aol searches that amateur sleuths re-identified? See: http://news.cnet.com/8301-13739_3-9826608-46.html
The Electronic Privacy Information Center and Patient Privacy Rights recently wrote Google a letter asking Google to reveal the algorithms they used for de-identification that prove the data REALLY was de-identified. They would not. See: http://www.patientprivacyrights.org/site/DocServer/EPIC-PPR_re_GoogleFluTrends_11-08.pdf?docID=4421
Those who claim that our data is safe and private should have to prove it. Actually it is hard to de-identify health data.
Why should the public trust the data mining industry when health data is so incredibly valuable?
Did you realize that every prescription in the US has been data mined and sold daily for over 10 years from all 51,000 pharmacies? Most people have no idea.
The three top prescription data mining corporations reported revenues of $65 billion in 2007. See: see See Fortune 500's data on their revenues at: http://money.cnn.com/magazines/fortune/fortune500/2008/snapshots/10630.html).
For information about how to help stop the systemic theft of your personal health data and restore your right to control personal health records see: www.patientprivacyrights.org
Deborah C. Peel, MD
Founder and Chair, Patient Privacy Rights