The title is a real question, I'd like to hear what people think.
Our department was given a briefing on yet another huge company-wide initiative to aggregate and coalesce all data, allowing us to develop relationships across whole departments and sectors of the businesses we run. It's a tremendous opportunity, and one which is needed if you consider what Facebook and Google are doing with data (among many other firms that have well-developed data management groups).
I had several questions about the project. For one, was there a revenue impact which was expected to offset the cost, and if so how was it calculated? What was the timeline for introduction at departmental and company-wide levels? What were the expectations of the use of the data? Was it better to implement in a piecemeal fashion, department by department - continuing the current path we are on - or was their top-down approach more efficient and likely to yield better results? Each question received an answer, sometimes dismissive, which led to more questions.
I was viewed negatively for my inquisitiveness. I explained I wasn't opposed to the project, but that I'd seen projects like this many times. None have worked as expected and most never paid off. These were not reasons to avoid doing it, but it is good to ask questions and be sure. I was told to 'trust' the data scientists, none of whom I know, and don't stand in the way. I acquiesced, and ceased my questions. Groupthink is a powerful thing. Data was here to save our business, I was assured.
On the train ride home, I ran into a colleague from another department who is much closer to this project and he told me even more details about the project. For one, it was the third attempt by this team to implement the 'vision' (so much for trust!). For another, they were abandoning all the work done in the previous 2 operations and starting from scratch, meaning work which had been done on all the old systems had to be reassessed and either tossed or transferred to newer platforms. Finally, they'd spent exorbitant sums of money already, to the point that break-even was probably 10 years off, assuming they met their 4 year timeline. He listened to my questions and nodded, saying they were all the right questions and there was good reason to question the nature and scope of this project.
Google, Facebook and all the other firms with huge data systems have the benefit of being young and starting from scratch while new technologies were being introduced. This is how business works, it's part of the process of creative destruction. The newer companies benefit from untried, but potentially beneficial products, living or dying by their ability to manage and incorporate these ideas and technology. Older companies have to try and keep up, and many are incapable of doing so. However, these older firms need to be careful about the implementation. Data is as much about art as it is about what the data tells us, sometimes less is more. Sometimes your gut tells you as much as $10mm worth of information does. I have seen people collect information on months-long projects only to confirm suggestions which were made at the outset. The delays cost money. There are rare, very rare, occasions when the data tells us something different. Sometimes the reason it tells us something different is due to the time delay in collecting the data. Perhaps this is a form of Heisenberg's Cat played out in the realm of business.
I am a huge believer in collecting and managing data. My job relies on it. But as I tell my boss, data and technology are like Stradivarius violins. You can give me a Stradivarius and I will make awful noise with it. Give it to a concert violinist, and beautiful music is made. The same is true of data. Many data scientists today, I've found, make very basic mistakes in their assumptions about what data tells them. The most common is the confusion over causation and correlation. I have had arguments with PhDs over this very issue when they present correlative data without proving the linkage to causation.
Baseball is a great example of this point. Sabermetrics have revived and increased my interest in the game. Yet Sabermetrics have limits. A cute, sappy movie Trouble With The Curve illustrates where data intersects with knowledge and experience. Data can provide support, but it takes experience to know what that data is telling you.
Dr. Joy Bliss recently posted about this issue, as the problem has infected even the realm of medicine and health.
Data can do many things. But the last thing it should be used for is policy-making, because data is typically utilized under the 'pretense of knowledge' and applied in a fashion that has unintended consequences. They may also have politics, which don't benefit you, built in.
Michael Crichton famously warned us of the problem of politicized science and data. Sadly, many intelligent people remain ignorant of misplaced trust in data, demonizing critics without explaining fully why the critics' logic is flawed.
A company, like the one which employs me, is just as likely to politicize positions. We call it groupthink. In my briefing, I was not part of the groupthink. I enjoy being on the outside. I may be wrong at times, but when I am, I'm happy to know that I have played the role of Captain Obvious, asking difficult questions in a fashion to open up the thought process further - if it can be opened up further. Sadly, as I watch what happens in the office, I begin to understand why Progressives remain so prevalent in our society. They are incapable of moving past groupthink. If everyone else is doing it, it must be good - right?