For Unsupervised, data complexity is an asset, not an obstacle.
The more variables you feed the Unsupervised AI, the better it runs
Our AI engine automatically runs permutations of different algorithms on your data, learning which ones are most effective for your dataset. Many of these techniques are from the unsupervised learning field. They were selected for their application to large-scale consumer data.
We use these advanced methodologies to discover a dataset’s inherent structure. This allows us to find nuanced patterns that most solutions miss.
Historically, unsupervised learning techniques haven’t been popular because their output is complex. Unsupervised has developed tools to make it easy to understand and act on patterns. The quicker you understand what consumer behavior impacts your business, the quicker you can use that knowledge to optimize the metrics that matter.
Unlike Unsupervised, supervised algorithms start with guesswork. This makes them vulnerable to error.
The concept of distance lives at the heart of most data science techniques. Ultimately, all algorithms are designed to compare pairs of points. They tell us how close—or similar—those points are to one another. In a dataset of consumer behavior, we identify meaningful similarities between consumers by identifying points of close distance.
Machine learning algorithms in statistical packages primarily use Euclidean Distance to measure closeness. But in a multi-dimensional space containing hundreds of thousands of data points, Euclidean measures of distance are ineffective. The more dimensions you add, the sparser your data’s domain becomes. This means that very few pairs of data will seem meaningfully closer to one another than other possible pairs. All data will appear to be far apart—which is to say, dissimilar. The more complex your data becomes, the harder it is to find groups based on meaningful similarities.
This is known as the Curse of Dimensionality.
The Curse of Dimensionality is the reason analyzing complex data is so slow and cumbersome. It’s the reason commercial Data Scientists pour so much time into selecting columns. Knowing their tools work better with fewer columns, they’re incentivized to simplify the dataset as much as possible.
Enter human bias. Data selection, supposedly occurring before data analysis, is itself an act of analysis that influences what you learn. It is incredibly vulnerable to erroneous guesswork. Every choice to omit a variable reflects an assumption about how the world works and what factors might impact the business. Removing attributes in a dataset leads to a hazier picture of reality. And the hazier the picture, the harder it is to spot important patterns.
That’s just data selection. The analysis process also requires guesswork. Do we normalize the data, and if so, how? What algorithms do we use? During this phase, teams cycle through approaches one by one, each time examining a few features in hopes that a strong correlation will emerge.
Unsupervised removes human bias from data analysis.
We remove the need to select specific data features—avoiding one source of human bias and analyzing features you would have otherwise missed
Our approach circumvents the Curse of Dimensionality. We do so by using metrics beyond Euclidean Distance and applying a collection of techniques—including algebraic topology, unsupervised deep learners, and probabilistic mixtures—that work especially well on complex, incomplete consumer data.
Complex datasets rarely map onto any one pattern or structure. Typically, they consist of hundreds of different patterns, each corresponding to a specific subset of the data. The problem with common methodologies is that they try fitting complex data systems into prefabricated patterns—and end up flattening nuance.
Our techniques don’t start with preconceived expectations. We discover all the patterns that exist in the data, not just the ones we anticipated. This lets us illuminate factors that matter in specific subsets of the dataset, even if they don’t matter anywhere else.
This approach has an added benefit, one you won’t find in methodologies that rely on clean data. Because our AI automatically identifies patterns in subsets of your data, the discovery of one pattern doesn’t affect discovery of other patterns. This is valuable for real-world analysis of imperfect datasets. Even if your data isn’t clean, we will still discover useful, unpolluted patterns from the accurate subsets of data.
Using these methodologies means the more complex data you run through our AI, the more nuanced your results will be. We are unique in the sense that you can’t overload Unsupervised with data. Our tool always benefits from additional detail.
In fact, we don’t begin engagements by cleaning or selecting data. Instead, we go through a data augmentation and expansion process to increase the data’s complexity—a method unique to Unsupervised. Essentially, it means we can uncover patterns on details that didn't even exist in a dataset’s original form.
The process involves semantically tagging every column of input data, then letting our AI derive new features based on that semantic understanding of the data. This might mean identifying differentials between columns, deriving discounting levels, incorporating 3rd party demographics or social network information based on email addresses, or adding inferred information. As an example, from timestamp data, we can infer whether a purchase was made on a weekday or weekend, which day of the week, a holiday, or the number of days from a holiday.
Unsupervised’s output is immediately actionable.
The engine we’ve built automatically runs through thousands of permutations of unsupervised learning techniques, producing and evaluating thousands of ways to expose patterns in your data. This results in a large set of patterns discovered in your data.
Unsupervised takes the raw output of these technologies and instantly translates them into insights humans can use. Discovered patterns are prioritized against any business metric that you want to analyze, so you can easily see which patterns are most impactful to the things you care about.
Freed from thinking about feature selection, data structure, and data design, you can now focus on finding new datasets to tap for patterns, turning patterns into action plans, and spreading insights on how to engage consumers across the business.