Sunday, March 12, 2006

Why Data Mining Won't Stop Terror

I read an interesting article on Wired News about why the NSA data-mining project is a bad idea. Bruce Schneier's point is that the number of false positives, even in an unrealistically accurate system, would make (and has made) it useless. These technical difficulties, combined with the threat to our civil liberties, should amke this a non-starter.

This isn't anything new. In statistics, it's called the "base rate fallacy," and it applies in other domains as well. For example, even highly accurate medical tests are useless as diagnostic tools if the incidence of the disease is rare in the general population. Terrorist attacks are also rare, any "test" is going to result in an endless stream of false alarms.

This is exactly the sort of thing we saw with the NSA's eavesdropping program: the New York Times reported that the computers spat out thousands of tips per month. Every one of them turned out to be a false alarm.

And the cost was enormous -- not just for the FBI agents running around chasing dead-end leads instead of doing things that might actually make us safer, but also the cost in civil liberties. The fundamental freedoms that make our country the envy of the world are valuable, and not something that we should throw away lightly.

Data mining can work. It helps Visa keep the costs of fraud down, just as it helps Amazon alert me to books I might want to buy and Google show me advertising I'm more likely to be interested in. But these are all instances where the cost of false positives is low (a phone call from a Visa operator or an uninteresting ad) in systems that have value even if there is a high number of false negatives.

Finding terrorism plots is not a problem that lends itself to data mining. It's a needle-in-a-haystack problem, and throwing more hay on the pile doesn't make that problem any easier. We'd be far better off putting people in charge of investigating potential plots and letting them direct the computers, instead of putting the computers in charge and letting them decide who should be investigated.

0 Comments:

Post a Comment

<< Home