Chemistry is not just about test tubes; similarly, generating insights and data science is not only about amassing huge volumes of data and applying algorithms to it.
Almost every enterprise, irrespective of size and nature of business operations, is focused on analytics. Business problems are technically complex and Chief Analytics Officers are turning to big data analytical platforms for solutions. For traditional analytics problems, both, the data needed and the process is known and well-defined. But, the less familiar you are with a given type of problem, the market, and customers, the less relevant the conventional analytics approach becomes.
For example, by putting together data from various sources and applying sophisticated algorithms, we can generate astonishingly detailed pictures of some aspects of the customer; however these pictures are far from complete and are often misleading. It may be possible to predict a customer’s next mouse click or purchase, but no amount of quantitative data can tell us why the customer made that click or purchase.
Generating insights is all about having information that reduces uncertainty. The better the information, the lesser the uncertainty. And how do we generate these insights? Conventional wisdom is, if we torture the data enough, it will tell us something—but it might not generalize beyond the data we’re looking at.
Each data-driven business decision-making instance is unique, comprising its own combination of expected outcomes, constraints, and even categories of hypothesis or trains of thoughts. However, in the rush to reduce the problem statement into strings of 1s and 0s and applying algorithms to generate insights, we are losing sight of understanding the business problem. A lot of people approach insights generation as an engineering process, prescribe a set of common tasks and train people on tools and technologies. This is in direct conflict with the very fundamentals of data science.