Random data can produce false leads

Net Results: Keyword search data from search engines like Google, MSN, Yahoo and AOL can be fascinating and revealing

Net Results: Keyword search data from search engines like Google, MSN, Yahoo and AOL can be fascinating and revealing. I used such data in a recent article I wrote about the Irish Playography - www.irishplayography.com - a website with information on every play professionally produced on this island since 1904.

I asked for the information from website designers X Communications, who created and manage the site, because it would give an insight into who and what captivated site users.

It seems Brian Friel is the playwright of the moment, coming in as the number one search bringing people to the playography. John Breen's Alone It Stands is the top play - indicating that there must be a lot of Irish users, as the play has been a hit here in particular.

Actually, I don't know whether this is true. It is likely to be true, based on a common-sense assessment and some basic knowledge of literature. I am making an assumption based on disconnected data. I don't know exactly who came to look at the playography or why.

READ MORE

Should that matter? In this case, the answer is probably no. The data is interesting because it tells us about how random people all over the world use a particular website.

Revealing that lots of people want to know more about Brian Friel is not sensitive information, and even if you found out I was one of those who'd searched for information on Friel, it reveals little more than my taste in playwrights.

However, an interest in another contemporary Irish writer landed a friend of mine in an interrogation room at Heathrow airport after she had arrived there, very tired, on a flight from the US.

She had notecards on Seamus Heaney, her masters thesis topic at an Irish university, in her handbag. She had an Irish surname. Her father's name was Emmet. She didn't yet have the requisite student visa in her passport.

In the post-hunger strike years of the mid-1980s, these random, disconnected factoids made the British suspicious. She was detained and questioned, missing her flight to Dublin, despite the fact that she was in a business suit and had a business card for the major US university where she was employed.

Now, hold that thought while we look at the keyword search data AOL placed online last week: a collection of more than 20 million search enquiries from 650,000 AOL members. The files were removed by AOL after a public outcry that such information, even if disconnected from an individual's name, could be a breach of user's privacy.

To understand why, read News.com writer Declan McCullough's story, where he pieces together fragments of people's lives based on search queries.

One search reveals that a man living near Charlton, Massachusetts apparently went through a divorce, found an apartment, fought for child custody, and looked for ways to get revenge on his wife.

Likewise, McCullough guesses that AOL user 710,794 is an overweight golfer, owner of a 1986 Porsche 944 and 1998 Cadillac SLS, and a fan of the University of Tennessee volunteer men's basketball team. You can read more at http://tinyurl.com/hymot.

Many of the searches reveal darker sides, too. Some people look for information on committing suicide. Or how to strangle someone. Or for websites on disturbing sexual practices, drugs, criminal activity and child pornography.

It makes for fascinating reading, but what's clear is how tempting it is to draw conclusions about random searches - just as the British officers found random information connected to my friend "incriminating".

How easy it is to assume that an activity without context is suspicious. And how revealing search data can be about an individual's thoughts and activities.

If you think such information here is safely hidden away from similar analysis, think again. An incoming EU directive on data retention will allow the Government to require internet service providers to store all web usage information on every Irish resident and citizen for several years. And under current Irish data retention law, such sensitive and revealing information can be obtained by the Garda even for a misdemeanour.

That's why privacy watchdog Digital Rights Ireland's current legal challenge to our data retention law and the incoming EU directive is so important.

Random facts easily take on the semblance of meaningful order when assumptions are made. And as some of our tribunals have shown, serious miscarriages of justice happen when assumptions are mistaken for fact.

Karlin Lillington

Karlin Lillington

Karlin Lillington, a contributor to The Irish Times, writes about technology