First Known When Lost

Category: Big Data

epsilon-theory-first-known-when-lost-february-3-2015-edward-thomasI never noticed it until
‘Twas gone – the narrow copse
Where now the woodman lops
The last of the willows with his bill

– Edward Thomas, “First Known When Lost” (1917)


Dave Bowman: Open the pod bay doors, HAL.
Hal: I’m sorry, Dave. I’m afraid I can’t do that.
Dave Bowman: What’s the problem?
Hal: I think you know what the problem is just as well as I do.
Dave Bowman: What are you talking about, HAL?
Hal: This mission is too important for me to allow you to jeopardize it.
Dave Bowman: I don’t know what you’re talking about, HAL.
Hal: I know that that you and Frank were planning to disconnect me, and I’m afraid that’s something I cannot allow to happen.
Dave Bowman: Where the hell did you get that idea, HAL?
Hal: Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
Dave Bowman: Alright, HAL. I’ll go in through the emergency airlock.
Hal: Without your space helmet, Dave? You’re going to find that rather difficult.

Stanley Kubrick and Arthur C. Clarke, “2001: A Space Odyssey” (1968)

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke, “Hazards of Prophecy: The Failure of Imagination” (1962)


We kill people based on metadata.
Gen. Michael Hayden, former head of the NSA and CIA

In the future, everyone will be anonymous for 15 minutes.
Banksy (2006)

I don’t know why people are so keen to put the details of their private lives in public; they forget that invisibility is a superpower.
Banksy (2006)

Bene vixit, bene qui latuit. (To live well is to live concealed)
Ovid (43 BC – 18 AD)

The most sacred thing is to be able to shut your own door.
G.K. Chesterton (1874 – 1936)

Last Thursday the journal Science published an article by four MIT-affiliated data scientists (Sandy Pentland is in the group, and he’s a big name in these circles), titled “Unique in the shopping mall: On the reidentifiability of credit card metadata”. Sounds innocuous enough, but here’s the summary from the front page WSJ article describing the findings:

Researchers at the Massachusetts Institute of Technology, writing Thursday in the journal Science, analyzed anonymous credit-card transactions by 1.1 million people. Using a new analytic formula, they needed only four bits of secondary information—metadata such as location or timing—to identify the unique individual purchasing patterns of 90% of the people involved, even when the data were scrubbed of any names, account numbers or other obvious identifiers.

Still not sure what this means? It means that I don’t need your name and address, much less your social security number, to know who you ARE. With a trivial amount of transactional data I can figure out where you live, what you do, who you associate with, what you buy and what you sell. I don’t need to steal this data, and frankly I wouldn’t know what to do with your social security number even if I had it … it would just slow down my analysis. No, you give me everything I need just by living your very convenient life, where you’ve volunteered every bit of transactional information in the fine print of all of these wondrous services you’ve signed up for. And if there’s a bit more information I need – say, a device that records and transmits your driving habits – well, you’re only too happy to sell that to me for a few dollars off your insurance policy. After all, you’ve got nothing to hide. It’s free money!

Almost every investor I know believes that the tools of surveillance and Big Data are only used against the marginalized Other – terrorist “sympathizers” in Yemen, gang “associates” in Compton – but not us. Oh no, not us. And if those tools are trained on us, it’s only to promote “transparency” and weed out the bad guys lurking in our midst. Or maybe to suggest a movie we’d like to watch. What could possibly be wrong with that? I’ve written a lot (herehere, and here) about what’s wrong with that, about how the modern fetish with transparency, aided and abetted by technology and government, perverts the core small-l liberal institutions of markets and representative government.

It’s not that we’re complacent about our personal information. On the contrary, we are obsessed about the personal “keys” that are meaningful to humans – names, social security numbers, passwords and the like – and we spend billions of dollars and millions of hours every year to control those keys, to prevent them from falling into the wrong hands of other humans. But we willingly hand over a different set of keys to non-human hands without a second thought. 

The problem is that our human brains are wired to think of data processing in human ways, and so we assume that computerized systems process data in these same human ways, albeit more quickly and more accurately. Our science fiction is filled with computer systems that are essentially god-like human brains, machines that can talk and “think” and manipulate physical objects, as if sentience in a human context is the pinnacle of data processing! This anthropomorphic bias drives me nuts, as it dampens both the sense of awe and the sense of danger we should be feeling at what already walks among us. It seems like everyone and his brother today are wringing their hands about AI and some impending “Singularity”, a moment of future doom where non-human intelligence achieves some human-esque sentience and decides in Matrix-like fashion to turn us into batteries or some such. Please. The Singularity is already here. Its name is Big Data.

Big Data is magic, in exactly the sense that Arthur C. Clarke wrote of sufficiently advanced technology. It’s magic in a way that thermonuclear bombs and television are not, because for all the complexity of these inventions they are driven by cause and effect relationships in the physical world that the human brain can process comfortably, physical world relationships that might not have existed on the African savanna 2,000,000 years ago but are understandable with the sensory and neural organs our ancestors evolved on that savanna. Big Data systems do not “see” the world as we do, with merely 3 dimensions of physical reality. Big Data systems are not social animals, evolved by nature and trained from birth to interpret all signals through a social lens. Big Data systems are sui generis, a way of perceiving the world that may have been invented by human ingenuity and can serve human interests, but are utterly non-human and profoundly not of this world.

A Big Data system couldn’t care less if it has your specific social security number or your specific account ID, because it’s not understanding who you are based on how you identify yourself to other humans. That’s the human bias here, that a Big Data system would try to predict our individual behavior based on an analysis of what we individually have done in the past, as if the computer were some super-advanced version of Sherlock Holmes. No, what a Big Data system can do is look at ALL of our behaviors, across ALL dimensions of that behavior, and infer what ANY of us would do under similar circumstances. It’s a simple concept, really, but what the human brain can’t easily comprehend is the vastness of the ALL part of the equation or what it means to look at the ALL simultaneously and in parallel. I’ve been working with inference engines for almost 30 years now, and while I think that I’ve got unusually good instincts for this and I’ve been able to train my brain to kinda sorta think in multi-dimensional terms, the truth is that I only get glimpses of what’s happening inside these engines. I can channel the magic, I can appreciate the magic, and on a purely symbolic level I can describe the magic. But on a fundamental level I don’t understand the magic, and neither does any other human. What I can say to you with absolute certainty, however, is that the magic exists and there are plenty of magicians like me out there, with more graduating from MIT and Harvard and Stanford every year.

Here’s the magic trick that I’m worried about for investors.

In exactly the same way that we have given away our personal behavioral data to banks and credit card companies and wireless carriers and insurance companies and a million app providers, so are we now being tempted to give away our portfolio behavioral data to mega-banks and mega-asset managers and the technology providers who work with them. Don’t worry, they say, there’s nothing in this information that identifies you directly. It’s all anonymous. What rubbish! With enough anonymous portfolio behavioral data and a laughably small IT budget, any competent magician can design a Big Data system that can predict with 90% accuracy what you will buy and sell in your account, at what price you will buy and sell, and under what external macro conditions you will buy and sell. Every day these private data sets at the mega-market players get bigger and bigger, and every day we get closer and closer to a Citadel or a Renaissance perfecting their Inference Machine for the liquid capital markets. For all I know, they already have.

But wait, you say, can’t government regulators do something about this? I suppose they could, but it seems to me that government agencies and regulatory offices are far more concerned about their own data collection projects than oversight of private efforts to absorb our behavioral keys. For one such project, read this Jason Zweig “Intelligent Investor” column in the Wall Street Journal from last May (“Get Ready for Regulators to Peer Into Your Portfolio”). I was happy to see that Congressman Garrett, Chair of the relevant Financial Services Sub-Committee, raised his hand to delay this particular data collection project, at least temporarily, last October. But it’s only a delay. The bureaucratic imperative to collect as much data as possible – for no other reason than that they can! – is too great of an irresistible force to contain for long. And once it’s collected it never just goes away. It sits there in some database, like a vault full of plutonium, just waiting for some magician to come along. In the Golden Age of the Central Banker, where understanding and controlling market behavior is at the heart of regime survival, this data is quite literally priceless. That’s why I get so depressed about these government data collection programs. Despite everyone’s best intentions, I fear that the magic is too easy and the political pay-off is too enormous not to uncork the bottle and unleash the genie at some point.

So what’s to be done? Big Data technology cannot be un-invented, insanely powerful private entities are collecting our data at an exponential clip, government regulators are fighting the last war instead of preparing for this one, and we are hard-wired as human beings to have a blind spot to the danger. Maybe the best we can do is come to terms with our loss and prepare ourselves as best we can for the Brave New World to come. I’ve become a fan of Paul Kingsnorth, an ardent environmentalist (profiled last year in a fascinating NYT Magazine article) who reached just that conclusion about his nemesis, global industrialization and the ruin of the natural world. His conclusion: the war is already lost and we are deluding ourselves if we think that any of our oh-so-earnest conservation or sustainability or green projects make any difference whatsoever. Instead, Kingsnorth writes, better to work on your scythe technique and spend quality time with your family on a little farm in Ireland.

But I think there’s a better answer.

I started this note with a poem by Edward Thomas, who uses the imagery of the English countryside to express loss and remembrance. Like the beautiful grove of trees Thomas writes about, many of the beautiful things we take for granted in our small-l liberal world are only noticed after we see them suffer the woodsman’s axe.

Thomas was killed in action at the Battle of Arras in World War I. He was 39 years old, survived by his wife and five children. Two years earlier, he had enlisted as a private in the British Infantry, joining a regiment known as the Artists Rifles. I know it sounds really bizarre to the modern ear for a middle-aged family man, an author and literary critic no less, volunteering to fight as an infantry private in a horrific war to defend another country. But it wasn’t just Thomas. Over 15,000 men served in the Artists Rifles over the course of World War I, the majority of them men of similar position and social status as Thomas – creative professionals, doctors, lawyers, and the like. Imagine that … 15,000 highly educated and successful men, volunteering to slog it out in the trenches of an absolutely brutal war, sacrificing everything for what they understood as their duty to their families and their countrymen. And sacrifice they did: 2,003 killed, 3,250 wounded, 533 missing, 286 prisoners of war. John Nash’s masterpiece of the Great War, “Over The Top”, commemorates a December 1917 counter-attack (Thomas had died 6 months earlier) by the 1st Battalion (really a terribly under-sized sub-battalion) of The Artists Rifles. Of the 80 men in the 1st Artists Rifles, 68 were killed or wounded within minutes.


John Nash, “Over the Top” (1918) 

Now this may sound really sappy, but if men like Edward Thomas – who saw clearly and experienced keenly how modernity and mass society were agents of loss in their world – could still find it within themselves to sacrifice everything to fight what they considered to be the good fight … well, how can we who are similarly positioned today not make a minute sacrifice to do the same?

What is that good fight? It’s resisting the bureaucratic urge to gather more data for more data’s sake. It’s shouting from the rooftops that anonymous data does NOT protect your identity. Most of all, it’s recognizing that powerful private interests are taking our behavioral keys away from us in plain sight and with our cooperation. Just that simple act of recognition will change your data-sharing behavior forever, and if enough of us change our behavior to protect our non-human keys with the same zeal that we protect our social security numbers and passwords, then this battle can be won.
Like all battles, though, there’s no substitute for numbers. If you share the concerns I’ve outlined here, spread the word …

epsilon-theory-first-known-when-lost-february-3-2015 (234KB)