Essay's Structure for Practical Data Community on Causal Analysis
I want to write something related to Causal Model for Practical Data Community. It will be a mini series, only 3 article.
1st article titled “Meet My New Friend, Causal Model”. This article will introduce what is causal model, how it works (start first with business understanding, put it in the DAG model, and eliminate confounder using backdoor elimination), how its result, and benefits of using it (compound your knowledge, an aligning tool mechanism — “We agreed that we will try to do X first because X will lead to Y which impactful for our Retention”).
In the 1st article, I should also structure my explanation tackling a common quote: “causation is not correlation”.
Quoting my Slack message below:
I plan to share more detail about this in Monday’s sharing session. But to condense what I’ve done here:
- (a) I think you folks familiar with “correlation is not causation” idiom. Actually we can use this to explain about this. We know that generally there are two ways to arrive at causation
- (1) If A correlate with B, we need to know if A influence B or the other way around
- (2) And again, if A correlate with B and we know that in our opinion A influence B (variable A have arrow to variable B), we need to make sure that there’s no confounding variable between these two that hinder us to achieve causality.
- (b) From these two path, we can then try to “solve it” by
- For (1), we need to draw the causal model to know which variable influence which (and the relation for the outcome). Which I’ve done it here
- For (2), after we draw the causal model, we need to have operation/procedure to eliminate the confounder. Exist in R code, for example here.
- (c) The last step is to use specific statistical method called “Inverse Probability Weighting” that take into account causal model that we’ve drawn and confounder that we’ve eliminated tutorial that I’ve used here. For relevant paper can check here.
2nd article titled “Warning: Your Causal Model Need An Update!”. In this article, it will show why we need to update the causal model (either from stakeholder’s feedback or your own research), how to update it, and the result after that.
3rd article titled “I introduce LLM to Causal Model. They Become Best Friend”. It basically show that step that still require human (human in the loop) in the 1st and 2nd article can be done by LLM (at least some of it).
Quoted from my Slack’s message
In my experience, churn analysis have more tendency to “fix the issue”, so I decided to not pursue typical “EDA” or descriptive analytics that I’ve done in the past. Instead, first I list down all the hypotheses that I’ve had, and then check it using causal inference method against the outcome that we want to achieve (which, in this case, is to increase renewal rate for Apr’25 cohort yearly subs)
With causal inference, we can:
- Measure and prioritize the variable based on its causal effect (see “Causal Effect” column)
- Measure what will be the new renewal rate (intended outcome) if they hypotheses is true/treatment is implemented to whole population, and see the total revenue uplift (see “Uplift” column). This make it easier for Tim, JJ and non-Data person to see the impact of our decision if we decide to act on particular hypotheses.
So in a way, the only new stuff that I implemented here is the way I frame the “biz impact”/potential uplift part.