Using data to help guide business strategy is the bread and butter of any professional data scientist.
Hypothesis testing is especially important, since it allows us to see if an observed correlation is in fact a causal relationship; a particular marketing method genuinely leads to better conversion, certain salespeople are genuinely better at building customer loyalty, a new store layout genuinely produces higher sales, and so forth.
Given the sheer volume of data that may be at a company's disposal, it is important to structure thinking carefully. One way to do this is by creating an issue tree, which breaks down a single big problem into progressively smaller 'issues' that we can tackle more easily.
Suppose we were consulting for a food trading company, Northwind. If our goal was to think of ways to increase their profit, we could construct the following basic issue tree:
We can already start to think about hypotheses that we could test. The results of these hypothesis tests can then be turned into recommendations for the company's management. Because we used the issue tree, we know automatically that our analysis will be relevant and actionable.
Analysis conducted using Python libraries and APIs:
- SQLite3 and Pandas for data mining and munging
- Numpy for creating the Monte Carlo simulation functions
- Matplotlib for data visualisation
View the readme on the project's GitHub repo for a full breakdown of the project.
This presentation was created to be presented to a non-technical audience in under 10 minutes.
the blog"So, if faced with a sea (or indeed a lake) of data, how are we supposed to hit upon analysis that is guaranteed to be important, rather than a mere intellectual sideshow? In this context, good analysis stems from a well-constructed hypothesis, and a well-constructed hypothesis is the fruit of a well-constructed issue tree...."
The Key Question For All Data Scientists: “So What?”
How to Write Hypotheses and Run Analysis that Makes Your Boss Sit Up and Take Notice
The hypothesis testing was conducted using Monte Carlo simulations.
Implementing this in Python requried a custom function, created using Numpy. Note - a gist for the subtract_array function used can be found here.