Metrics Tree Guide
Articles
/
Root Cause Analysis
/
Root Cause Analysis with Metric Trees

Root Cause Analysis with Metric Trees

Root Cause Analysis with Metric Trees as Causal Models

Causal analysis is the most common and often the highest ROI type of business analysis. If you’ve spent any time in business, you’ll recognize causal analysis as any analysis that follows a question that starts with  Why.” 

Answering these questions requires us to identify the cause of the phenomena. Only then can we remediate the situation or exploit the newfound insight.

Why has web traffic from New York doubled since last week? Why did buyers start to choose the free plan instead of a paid plan? Why has usage of our chat feature declined?

A common causal dilemma

A particularly gruesome example of this question has plagued go-to-market functions for decades: Why are sales-qualified leads down this month?

As soon as the question is uttered, sales and marketing teams dig into their respective positions. Leaders summon their analysts to present evidence that supports their perspective and deflects their blame.

However, this handoff introduces its own set of problems. As the analysis moves further into the realm of analytical specialization, each layer of expertise removes the investigation further from the original business context. Specialists, while highly skilled in their domain, may find themselves disconnected from the practical realities and nuances of the business operation that prompted the analysis.

The analysis cycle slows until it loses its initial intent, veering off into interesting but ultimately irrelevant tangents. Amidst this turmoil, the Highest Paid Person’s Opinion (HIPPO) frequently dominates the discourse. Decisions become more about hierarchy and less about data-driven insights, exacerbating the divide and leaving the original "Why" question only superficially addressed or altogether forgotten.

Ultimately, functions work faster and harder at anything they think might stick, throwing solutions at the problem in the hope something will resolve the issue without truly understanding it.

In such an environment, the importance of rigorous causal analysis becomes even more pronounced. It's not just about finding an answer quickly; it's about finding the right answer that truly addresses the core of the problem. Without a systematic approach to understanding causal relationships, teams risk running in circles, expending energy on strategies that may not tackle the underlying issues.

This cycle highlights the critical need for a unified and purposeful approach to data analysis, one that bridges the gap between high-level decision-makers and the nuanced insights that data specialists can provide. Only by binding the structural business knowledge to the analytical processes can organizations hope to move beyond superficial answers and towards meaningful solutions that lead to tangible change.

Don’t worry. The purpose of this guide is not to bemoan the inherent challenges of analytics. It’s to provide a solution that democratizes the root cause analysis process—a framework to perform the analysis that shortens execution time and reduces ambiguity in the process. 

Metric trees, by design, offer a causal model for metric changes that happen over time.

It should come as no surprise that this solution comes in the form of (or rather builds upon) the concept of metric trees. Metric trees, by design, offer a causal model for metric changes that happen over time.

Let’s look at what a causal model is and why metric trees excel at filling that role.

What is a causal model?

You are probably familiar with the concept of a data model. If you work with data directly, you’ve learned (often the hard way) how they can make or break your analytics efforts.

They are fundamental to understanding and communicating about business operations through the lens of data. They are imperative to our work's longevity, reusability, and validity.

However effective they are at providing the semantics for discourse and analysis, they fall short in providing the syntax for causation.

In other words, it's helpful to understand that a customer used to belong to the free plan and now belongs to a paid plan, and that plan type is worth some amount of revenue. But it doesn’t inherently capture why they upgraded, what interventions we might perform to upgrade others like them, and how long we’re likely to retain them.

This is where a causal model come in.

A causal model is a conceptual model that describes the causal mechanisms of a system.

A causal model is a conceptual model (using the generic definition here, not the data modeling definition) ”that describes the causal mechanisms of a system.”

These models help us understand, explain, and simulate cause and effect in things like, say, a sales funnel. The basic premise is that if we know the causal relationship between given states and events, we can understand why things happen, predict what will happen, and design interventions to affect outcomes. 

There are a number of ways to represent a causal model, but all of them share the same basic concept: there are nodes and edges, and the edges are all directed with an arrow implying causation. A simple model might look like this: A→B→C or… I cut the tree  → the tree fell → it made a loud noise. 

Boom! I caused the noise. Or phrased differently, the noise happened because I cut the tree. 

All this talk about nodes and edges probably brings to mind something familiar: DAGs. Much like dbt, data transformation flows move in one direction and loop back. Workflow builders are also, in fact, causal models. Causal models are a little different because they can loop. Consider viral models!

The diagram below shows part of a sales funnel as a causal diagram, also reflected as a metric tree. Each condition can be expressed as a precondition (a “component cause”) for metrics further up the metric tree.

So what?

If you started this guide reading about metrics trees, you’ll recognize that causal models look a lot like metric trees. That is because metric trees are, in fact, causal models. As such, they enjoy a couple of handy properties:

  • They are great for establishing and sharing conceptual “truth” about the world.
  • They provide the syntax for explanation: “SQLs are down because qualification rate is down.”

Or, from a different perspective, causal models (and metric trees provide)...

  • The syntax for causal explanation: “SQLs are down because qualification rate is down” 
  • The syntax for prediction: “If we increase the qualification rate, we’ll have more SQLs”
  • The framework for modeling a different version of the world
  • The framework for planning and experiment design. (If we do X, then we should expect Y.)

You probably already had an intuition about this stuff. Now you have the “edges” to tie it all together.

What isn’t a causal model?

There is a common fallacy that causation can be inferred through statistical models alone.  The problem with this view is that these relationships can be held together by nothing more than spurious correlation. Without a contextual understanding of the phenomena these relationships may defy logic and reason, rendering them impossible to communicate or act upon.

This problem is common among overly complex SaaS prediction tools that operate on the level of raw data—a wash of rows and columns, metrics, and dimensions instead of commonly understood data models. This two-dimensional view of the world does not provide enough context about the world and the causation within it. You can tell quick intuitively from the images above that metrics are a better abstraction to express causation.

Let’s take this conceptual foundation and develop a practical application on top.

The Root Cause Analysis Process

If you read up on Root Cause Analysis (RCA), you’ll soon understand that RCA has five steps. Nobody knows who came up with the steps or when, but everybody accepts these steps as doctrine. (Shout out to Toyota in the 1970s and all management scientists and the Lean “Black Belts” since.)

The five steps to root cause analysis:

1. Define the Problem

2. Collect Data

3. Identify Possible Causal Factors

4. Determine the Root Cause(s)

5. Implement and Monitor Corrective Actions

While these five steps would have been revolutionary in the 1970s, they seem somewhat shallow, if not obvious, to anybody working in the era of cloud-based analytics and data-driven operations.

Steps one and two speak for themselves. If you have even the most rudimentary reporting capabilities you’ll have started collecting data years ago and identified problems as they surface on your BI dashboards. 

Step three is where things get interesting.

Identifying causal factors with metric trees

A common analytical approach for this phase of analysis is the “Five Whys” approach. It’s easy to understand because it’s all in the name. To determine the cause of a phenomenon, you simply ask “why,” as you peel back each layer of causation, you get closer to the cause. Often, this process takes about five iterations.

This is similar to the “drill-down” pattern in analytics. Let’s apply this to a basic business question:

“Why are Sales Qualified Leads down this month?“ 

You might first explore the effect of seasonality. If that doesn’t explain the decline, you might look earlier in the funnel by “traversing down the tree” and assessing whether there were issues with the volume of Sales Accepted Leads or Sales Qualification Rate.

Let’s say you discover an issue with the Lead Acceptance Rate. Asking “why?” you could look at the factors that might impact that. It could be things like the quality of prospects from different channels or the changes in the lead scoring model. 

Through this process, you would have scanned across a comprehensive set of factors that cause metric drift: temporal variance, component drift, influence drift, dimension shifts, and perhaps relevant external events.

If you had approached this analysis with a metric tree as a causal model, you would have a standardized structure with all the information you need to execute the analysis. Each metric has a time aspect, is composed of components or influences, and is sliceable into dimensions.

If you’ve done the work of constructing the tree up front, you can just recurse your way down the tree to swiftly identify causes.

If this simplicity sounds appealing, you’ll like the following article, Five Causal Factors of Metric Drift, which will describe in detail how to perform a root cause analysis with a metric tree.

💡 Guidance to readers

As you read this guide, you probably fall into one of two camps: students of analytics seeking to broaden your view of the world or analysts currently conducting a root cause analysis. Depending on your position, we have some suggestions for how you proceed.

  1. If you are reading this for educational purposes, keep this approach to RCA in mind for when it’s time to design a metric tree. Thinking about the application will help you design a more useful tree from the start.
  2. If you’re following this guide to carry out the procedure right now, following the analysis process below will compel you to assemble a metric tree. Keep the newly assembled tree. You can use it as a starting point for a larger metric tree or plug the tree into a larger tree where it fits. You’ll be happy that you completed so much work in advance!

Now, with your purpose in mind, let’s carry on to the next article about identifying causal factors.

PAGE NAVIGATION

Subscribe for updates

Access the latest articles, tools, and templates as they are published.

Subscribe
Subscribe
Subscribe
Subscribe