Ele
Ele

Reputation: 553

Calculate recurring customer

I'm analyzing sales data from a shop and want to calculate the percentage of "first order customer" who turn into recurring customers in following month.

I have a DataFrame with all the orders. This includes a customer id, a date and a flag if this is his/her first order. This is my data:

import pandas as pd 

data = {'Name': ['Tom', 'nick', 'krish', 'Tom'], 
        'First_order': [1, 1, 1, 0], 
        'Date' :['01-01-2018', '01-01-2018', '01-01-2018', '02-02-2018']} 

df = pd.DataFrame(data) 

I would now create a list of all new customers in January and a list of all recurring customers in February and inner-join them. Then I have two numbers with which I could calculate the percentage.

But I have no clue, how I could calculate this rolling for a whole year without looping over the data frame. Is there a nice pandas/python way to do so?

The goal would be to have a new dataframe with the month and the percentage of recurring customers from the previous month.

Upvotes: 4

Views: 1364

Answers (1)

Jim Eisenberg
Jim Eisenberg

Reputation: 1500

One thought would be to take all orders Jan-November and have a column "reccurr" which gives you a true/false based on if this customer ordered in the next month. Then you can take a per-month groupby with count / sum of true/falses and add a column giving the ratio.

EDIT: before this you may need to convert dates:

df.Date = pd.to_datetime(df.Date)

Then:

df['month'] = df['Date'].apply(lambda x: x.month) #this is for simplicity's sake, not hard to extend to MMYYYY
df1 = df[df.month != 12].copy() #now we select everything but Nov
df1 = df1[df1.First_order == 1].copy() #and filter out non-first orders
df1['recurr'] = df1.apply(lambda x: True if len(df[(df.month == x.month + 1)&(df.Name == x.Name)])>0 else False, axis=1) #Now we fill a column with True if it finds an order from the same person next month
df2 = df1[['month','Name','recurr']].groupby('month').agg({'Name':'count','recurr':'sum'})

At this point, for each month, the "Name" column has number of first orders and "recurr" column has number of those that ordered again the following month. A simple extra column gives you percentage:

df2['percentage_of_recurring_customer'] = (df2.recurr/df2.Name)*100

EDIT: For any number of dates, here's a clumsy solution. Choose a start date and use that year's January as month 1, and number all months sequentially after that.

df.Date = pd.to_datetime(df.Date)
start_year = df.Date.min().year
def get_month_num(date):
    return (date.year-start_year)*12+date.month

Now that we have a function to convert dates, the slightly changed code:

df['month'] = df['Date'].apply(lambda x: get_month_num(x))
df1 = df[df.First_order == 1].copy()
df1['recurr'] = df1.apply(lambda x: True if len(df[(df.month == x.month + 1)&(df.Name == x.Name)])>0 else False, axis=1)
df2 = df1[['month','Name','recurr']].groupby('month').agg({'Name':'count','recurr':'sum'})

Finally, you can make a function to revert your month numbers into dates:

def restore_month(month_num):
    year = int(month_num/12)+start_year #int rounds down so we can do this.
    month = month_num%12 #modulo gives us month
    return pd.Timestamp(str(year)+'-'+str(month)+'-1') #This returns the first of that month
df3 = df2.reset_index().copy() #removing month from index so we can change it.
df3['month_date'] = df3['month'].apply(lambda x: restore_month(x))

Upvotes: 1

Related Questions