Missing the Gorilla in the Data
Imagine you’re watching a basketball game. The opposing team is wearing white uniforms and the home team is wearing black uniforms. The opposing team is moving the ball around, passing back and forth around the half court.
Now, imagine in the middle of this action a gorilla walks onto the court, thumps her chest, and then ambles out of view.
You’d definitely notice that, right?
Your answer is probably an emphatic “yes,” but should more likely be “maybe.”
Harvard psychologists tested this same scenario in a famous experiment. They asked subjects to watch a basketball game in which one team played in white shirts and the other wore black shirts. The psychologists instructed subjects to press a button whenever a player in white successfully passed the ball. When a woman in a gorilla suit walked into the view of the camera, thumped her chest, then walked off, half of the subjects didn’t even notice.
The phenomenon is referred to as inattentional blindness, which refers to the failure to notice unexpected events when concentrating on something else. It’s cited as the reason for why texting and driving should be banned or why we miss lower prices when shopping at a grocery store.
It may also explain why large enterprises every day fail to see important patterns and insights in their data.
Inattentional Blindness in Data Analysis
Understanding how we miss valuable insights requires an examination of the way most data analysis is performed in large organizations.
Typically, analysis begins with an hypothesis: Someone, maybe a business lead or an analyst, has an idea about some event or attribute impacting the business. That person will then attempt to test the hypothesis by working with internal teams -- depending on time and available resources -- to gather, join and clean up data for analysis.
The problem with this traditional approach, beyond the issues of the time and expense required, is that it limits the patterns that could be impacting the business to those that would prove or disprove a single hypothesis.
As an example, let’s say a major retailer forms the hypothesis that a line of clothing is experiencing a drop in sales due to seasonality. The method for testing this would be by looking at sales data by month, looking for presumed changes to consumption corresponding to a general change in the weather.
But what would this actually tell us? That some products sell better in the winter than in the summer. Besides changing up the product mix, there’s not a lot of optionality provided by the analysis. And worst yet, the narrow hypothesis may have blinded us to the gorillas in the data -- the unrecognized patterns that have a simple, direct impact on sales.
This is because we’re focused on analysis for the sake of proving out a theory, not analysis for the purpose of generating theories.
Returning to the sales example, some of the patterns that could be missed might include:
- A steady or uptick in sales in particular regions even as the clothing line’s total sales sag
- A significant attachment rate for the clothing line among customers purchasing another product with overall increasing sales
- A markdown of the clothing line led to increased sales among loyal customers with lower average spend per shopping session
- A rise in third-party online sales for the clothing during the same period
Each of these is far more actionable than the general hypothesis. Starting with a hypothesis blinds us to the impactful patterns seemingly buried in the data, but more likely staring us in the face.
Removing the Blinders
In life, the methods to avoid inattentional blindness are fairly straightforward (i.e. don’t text and drive). But in business, it involves a different way of approaching data analysis. Namely, starting with the analysis of the data, and then forming an hypothesis.
Unlike supervised machine learning methods, unsupervised learning empowers the ability to start with analysis first and then form hypotheses based on that analysis. Running an analysis of both structured and unstructured data across all data sources, the business can identify the patterns and rank them based on impact to a KPI (i.e. increased sales, customer churn, employee retention, fraud risk, etc.). These patterns form the insights both an analyst and business lead can act on. (And more often than not, the insight is startlingly simple.)
Unsupervised learning mitigates inattentional blindness. Instead of sharply focusing on a single hypothesis, the business is exposed to lots of possible patterns impacting the underlying concern. Rather than trying to describe an entire landscape while staring at a single blade of grass, the analyst can view the entire vista then choose where to focus.
Inattentional blindness is not a human vice. It’s simply a limitation of cognitive ability when people are employing their superior skills to focus. At its best, AI allows people to augment their focus skills with unparalleled pattern recognition. It ensures we can see all the gorillas in our midst.