Reputation: 553
I'm analyzing sales data from a shop and want to calculate the percentage of "first order customer" who turn into recurring customers in following month.
I have a DataFrame with all the orders. This includes a customer id, a date and a flag if this is his/her first order. This is my data:
import pandas as pd
data = {'Name': ['Tom', 'nick', 'krish', 'Tom'],
'First_order': [1, 1, 1, 0],
'Date' :['01-01-2018', '01-01-2018', '01-01-2018', '02-02-2018']}
df = pd.DataFrame(data)
I would now create a list of all new customers in January and a list of all recurring customers in February and inner-join them. Then I have two numbers with which I could calculate the percentage.
But I have no clue, how I could calculate this rolling for a whole year without looping over the data frame. Is there a nice pandas/python way to do so?
The goal would be to have a new dataframe with the month and the percentage of recurring customers from the previous month.
Upvotes: 4
Views: 1364
Reputation: 1500
One thought would be to take all orders Jan-November and have a column "reccurr" which gives you a true/false based on if this customer ordered in the next month. Then you can take a per-month groupby with count / sum of true/falses and add a column giving the ratio.
EDIT: before this you may need to convert dates:
df.Date = pd.to_datetime(df.Date)
Then:
df['month'] = df['Date'].apply(lambda x: x.month) #this is for simplicity's sake, not hard to extend to MMYYYY
df1 = df[df.month != 12].copy() #now we select everything but Nov
df1 = df1[df1.First_order == 1].copy() #and filter out non-first orders
df1['recurr'] = df1.apply(lambda x: True if len(df[(df.month == x.month + 1)&(df.Name == x.Name)])>0 else False, axis=1) #Now we fill a column with True if it finds an order from the same person next month
df2 = df1[['month','Name','recurr']].groupby('month').agg({'Name':'count','recurr':'sum'})
At this point, for each month, the "Name" column has number of first orders and "recurr" column has number of those that ordered again the following month. A simple extra column gives you percentage:
df2['percentage_of_recurring_customer'] = (df2.recurr/df2.Name)*100
EDIT: For any number of dates, here's a clumsy solution. Choose a start date and use that year's January as month 1, and number all months sequentially after that.
df.Date = pd.to_datetime(df.Date)
start_year = df.Date.min().year
def get_month_num(date):
return (date.year-start_year)*12+date.month
Now that we have a function to convert dates, the slightly changed code:
df['month'] = df['Date'].apply(lambda x: get_month_num(x))
df1 = df[df.First_order == 1].copy()
df1['recurr'] = df1.apply(lambda x: True if len(df[(df.month == x.month + 1)&(df.Name == x.Name)])>0 else False, axis=1)
df2 = df1[['month','Name','recurr']].groupby('month').agg({'Name':'count','recurr':'sum'})
Finally, you can make a function to revert your month numbers into dates:
def restore_month(month_num):
year = int(month_num/12)+start_year #int rounds down so we can do this.
month = month_num%12 #modulo gives us month
return pd.Timestamp(str(year)+'-'+str(month)+'-1') #This returns the first of that month
df3 = df2.reset_index().copy() #removing month from index so we can change it.
df3['month_date'] = df3['month'].apply(lambda x: restore_month(x))
Upvotes: 1