Saturday, January 28, 2017

Basic Data Mining for Customer Segmentation using Logistic Regression

Logistic regression is a data mining technique that is used in banks to determine various things like the risk factor associated with a person. If a customer is above a certain risk limit than services like overdrafts are not extended to the customer. Similar data mining can be done by any marketing company to find out which of their customers are going to pay or are capable of paying for their products - for example for a gaming company that would be paying for playing the game. A combination of a number of variables could lead to giving out this important information – age, sex, location, salary, education etc. Each variable would have a different weight associated with it, where higher weights (coefficients) would represent more important variables. A part of the historical data can be used as training data to get a good estimate of the coefficients and the results can be tested against the rest of the historical data to check the accuracy.

When the results are accurate enough, logistic regression could be used to determine the probability of a customer paying for virtual currency in the next one day/one week/one month etc. Customers could also be segmented according to their paying probability and paying capacity so good decisions could be made on spending on acquisition of these customers. This can also lead to saving a lot of money by cutting down on paying for acquisition of non-paying customers. Similar technique can be used to find influencers and influential people and attract them to play games thereby helping games go viral. The process is simple and saves a lot of money by recognizing which customers are capable of paying and targeting all your campaigns to attract this user base. A lot of segmentation is possible using this simple method of data mining.

Example: Suppose the variables that are most important in finding paying are – Age, Education and Salary.
β0 = 1 (the intercept)
β1 = 2
β2 = 3
β3 = 4
x1 = Age
x2 = Education in years above high school
x3 = Salary in dollars above 50000

The model can hence be expressed as:
Probability of conversion to paying customer = 1/1+e^-z (Z= 1 + 2 x1 +3 x2+ 4x3 )

With increase in age, education and salary the probability of paying increases.
So, for a customer who is 24 years of age, has studied 7 years after high school and has a salary of 100,000 dollars the probability of conversion to paying customer would be 1/1+e^-z (Z= 1 + 2*10 +3 *7+ 4*50000 )

After understanding the segmentation of the customers and the probability of conversion to a paying customer, informed spending decisions can be made. Advertisements can be targeted to only to the segment desired and games can be designed to cater to the paying audience. 

List of territories/countries I have visited

Any one who knows me, knows that I love travelling. Here is a snapshot of the wonderful places I have visited till date. It's fun to mi...