Data Science: How to frame the business problems as machine-learning problems (Part II)
In the previous articles, we discussed when to use machine learning to solve a business problem and how to frame the business problems as a machine learning problem.
The brain map of where we are going is this: Machine learning is about developing a Function/Algorithm (F) that uses a set of scenarios (S) to help achieve an objective function (L) and in turn help improve the business metric (M). And we want to do it in a way that Algorithm (A) is generic enough that when it is fed a new scenario (S’) it is able to generate an output meeting the objective function (F).
So let us get on with it:
(1) Define the set of scenarios (S): Scenarios consist of a set of inputs and outputs. S: I => O
For example, the set of scenarios in our Search algorithm was the users landing on our website and using various keywords to search for inventory. In this case, the keywords used are the inputs and the inventory returned are the outputs. We want to return the most appropriate Output corresponding to the Input. However, if you have searched on Amazon, you must have seen that the algorithm rarely returns just one item, instead, it returns a set of items ranked in order — so the output is not only the item but the ranking of the relevant items. So instead of calling our problem ‘Search problem’, we will call it the ‘Search ranking problem’. Now that we understand our scenario, we will phrase our business problem from improving the search experience to a more formal algorithm construct — In what order the inventory/item should be displayed so that the item that is most likely to be clicked by users appear higher up in the ranking. Done well, you have happy customers and if done poorly, will yield frustrations, and worse, losing your users forever.
(2) Define the output variable (O): What is the output variable and what is the type of the output variable — is it discrete or continuous such as predicting how many customers will convert if we raised the prices by $100 is a continuous output problem and may need a regression approach but whether a customer will convert if we showed the artificial discounting on the search page has a discrete (In this case Binary — Converting or not converting) output and hence may need a classification approach.
(3) Define the objective function (L): Our goal if you remember was to map the Input (I) to the Output (O) but how do we know the mapping is right — Unfortunately the business problems that we will work on rarely have black or white response. For example — there is no right home when users search for a 3 bedroom on the Airbnb website. So instead we will focus on the optimal mapping. What does ‘optimal’ mapping mean, how do we quantify them? This is where the objective function comes into the picture. One way to understand is to think of them as a satisficing condition — it helps us narrow down to a good enough solution but keep reducing it does not necessarily lead to a better solution. So in the forest of possible solutions and stopping conditions, the Objective function acts as a map to identify the ‘right enough’ tree.
Thus we will break down our business problem into an optimal problem construct, which we will try to minimize (or maximize — note: A minimization problem can easily be converted into a maximization problem and vice versa. For example min L is the same as Max -L) to achieve the desired business metric (M).
So what should be the objective function for our Search ranking problem? As we discussed, we want to put the most relevant items at the top of the list so that users perform fewer clicks. The hypothesis is that if users can find the items sooner they will convert at a higher rate. So, one idea could be to measure how many times users clicked on items lower on the list and not the ones at the top of the list? This is called ‘inversion’ — so the objective function would be to minimize inversion. Suppose the user always clicks the top appearing item and only then clicks on the next item in the list — then inversion is 0, however, if user clicks on item #4 and not on item #1 then that is one inversion i.e. item #4 should be ranked above item #1 based on user performance.
We could also measure the distance i.e. we want to penalize more if the user clicked on item #12 without clicking on item #1 than if the user clicked on item #4 without clicking on item #1
A key point to remember is to avoid over-complicating the objective function but ensure that we do not lose important business information in doing so. In the above example, we said we could use either a simple count of inversion or also include the distance of inversion — which one to use will depend upon whether the distance of inversion matters for the business scenario or not. For example, if users never go beyond the first page of the result then there is a higher risk of losing customer interest if item #1 appears in the place of item #20 on the second page than if item #2 appears in the place of item #5 on the same page.
(4) Define the model metric: You can use the above objective function to define your model metric such as while minimizing the number of inversions is an objective function, the #inversion is the model metric — the lower it is the better is the performance of your model (with diminishing business return).
I stress to involve the business stakeholders when you are defining the scenarios (S), the set of inputs (I) and the outputs (O), the objective function (L), and the model metric (M’), which should ideally be a proxy of the business metric (M). The more involved they are at this stage, the higher the chances that the team will be able
a) Manage business expectation
b) Find buy-in and adoption
c) Explain the complexity
d) Use their expertise to identify the right inputs, the objective function, and additional constraints, if any, also called the business rules.
(5) When in doubt, always choose the simplest model: Explainability is paramount in business situations. When the conversion is falling off the cliff — blaming it on the algorithm saying machine learning is a black box rarely works. So unless the uplift due to a more complex model is tangible and worth the headache — choose a simpler one. It will make debugging and refinement much easier and economic.
That is for this article. In the next article, we will discuss the last mile of the machine learning project, which is to convey the solution to the business users.
 TU Delft. 5.3 Counting Inversions, https://ocw.tudelft.nl/wp-content/uploads/Algoritmiek_Counting_Inversions.pdf
 ML Wiki. Inversion Count, http://mlwiki.org/index.php/Inversion_Count