The Year a Black Swan Ate Our Models
A customer I spoke to recently summed up the current situation well. He said, “The black swan ate my models.”
Whether you serve consumers or other businesses, you’re likely in the same boat. The reality is that we’ve faced multiple black swan events recently. And those events have upended all aspects of daily life, how we do business, and importantly the models analytics teams use to inform decision-making.
The two key sets of technology — modeling and business intelligence — analytics teams have relied on for the last ten years are not built to cope with the ever-changing reality we face today. The recent events we’re all experiencing reveal that we must re-evaluate how data can support the business long term. We believe analytics teams need to:
- Learn how to throw out old assumptions
- Expand their data pools to include unconventional sources
- Incorporate unsupervised learning into their toolsets
How the Black Swan Ate Modeling
Supervised modeling is the modeling most data science teams use. They look at historical data sets and predict a range of possible future results.
Most simply put, data scientists create a model - fundamentally an equation - using historical inputs. That resulting model can then predict a value for future records, as long as external influences remain reasonably constant. When things in the world are stable, these predictive models can predict likely output in the near future.
I am sure you can see the problem with using the assumption that things will “remain” reasonably constant in 2020. Or 2021. Or even in 2022.
Things aren’t constant day-to-day in 2020. What isn’t on fire on the West Coast is smothered in smoke, the U.S. is bracing for a “second wave” of COVID, and the unemployment rate, which was as low as 3.5% in February, spiked to 14.7% in April and is now sitting around 8.4%. Who knows what it will be tomorrow?
Our historical data doesn’t look a lot like what we see in our current world. Unfortunately, this means all of the models based on historical data aren’t really useful right now.
So why not build a new model?
We’re not creating enough data fast enough to create new models that can predict what will happen today. And everything is changing so fast in terms of infection rates, travel restrictions, and other miscellaneous non-COVID events; they wouldn’t be good for more than a few days even if you magically could snap your fingers and develop a new model.
There’s hope that old models are going to be useful again in the future. The problem is it’s not clear if the post-COVID world will look just like the pre-COVID world.
All of our historical data might be from a world that’s different. Supply chains have changed, perhaps for the long term. Consumer behavior and business buying behavior have drastically changed while people figure out how to cope with what’s happening today and are guessing what a new normal might look like.
So many things are changing that it’s not going to be clear if any of our traditional models we spent so much time and effort on will be useful again.
How the Black Swan Ate BI
In Business Intelligence, teams focus on two main functions:
- Dashboarding and reporting
- Ad hoc investigation
Dashboarding and reporting aren’t going away and are very much needed today. It helps us understand where we stand at the moment and gives leadership guidance on where to focus next. Most of us don’t have the context we need for reports to be as effective as they could be, but we’ll get into that more later.
Ad hoc investigation is much trickier today than it was a year ago.
Ad hoc investigation happens when leadership and analysts look at existing charts and dashboards and spot something they don’t understand. The questions that come out of these sightings are:
- Why is that happening?
- What are the things driving that?
The way analyst teams traditionally answered those questions was by using their historical experience to come up with hypotheses. They’re taking the things they’ve seen in the past and digging in to see whether the same thing--or a slight variation of that thing--happened again. They’re using their BI software to try to build charts to confirm their hypotheses.
The problem is that historical events don’t necessarily look like what’s happening today. They have to assume the data and charts they’ve used in the past can answer the questions they have today.
Unfortunately, that is not particularly helpful today. We should not even assume our hypotheses are valid.
The ten years leading up to 2020 did not necessarily prepare analysts to be asking the right questions today.
How the Black Swan Ate Traditional Data Sources
In order to have a BI tool run smoothly, you need pre-defined, cleansed, and joined data to pull from. BI tools are plumbed straight into enterprise systems like data warehouses and enterprise planning systems, and then data is usually manipulated to be in a consistent format across multiple systems. BI tools are statically configured to analyze the data they have historically analyzed.
Normally that’s fine, but if you look at what’s going on today, we know that a lot of the events driving behavior lives in data that didn’t even exist a year ago. COVID-19 case counts by counties, movement restriction orders that governors are putting in place, wildfire evacuation paths, and a city’s level of civil unrest all influence behavior.
But this information can not be pulled into the BI tools easily to mate up with enterprise data.
Unfortunately, the tooling and the data that are at the disposal of analytics teams are not really what is needed right now.
So, What Does Work?
The techniques we see working very well right now are unsupervised learning techniques.
Unsupervised learning is the branch of machine learning that is ideal when humans haven’t seen an event before or aren’t sure where to begin investigating. The machine watches a broad range of data for patterns--even unstructured data. Once patterns are flagged, humans can look at the data to understand what’s relevant, what should be investigated further, and what is good to know but can’t be controlled (like a natural disaster or global pandemic).
Unsupervised learning removes selection bias by not limiting the data that is reviewed by the system. This allows for uncovering patterns that are either new or previously undiscovered due to time constraints (imagine the nightmare of trying to apply some structure to unstructured data) or selection bias.
At Unsupervised we are obviously very focused on unsupervised learning, and we’d love to talk to anybody who wants to explore our solution. We think adapting reporting methods is so critical for our future success in the United States that we think you should explore other options as well. There are open-source tools and techniques to incorporate unsupervised learning into your arsenal.
What’s clear is in a time of ahistorical change relying on historical data won’t suffice. But we have more tools available to plot a path forward. If you’d like to hear how we can help, just reach out.