Reputation: 23
I have a pandas dataframe containing dates of when a customer enters a shop. I'm looking for a method that will allow me to count the number of times a customer has visited a shop in the past month from the current Date_Visited including the current visit.
So, for a minimal dataset below
Customer_ID Date_Visited (Year-Month-Day)
1 2020-07-10
2 2020-07-09
1 2020-01-01
2 2020-07-08
1 2020-07-08
3 2020-07-01
I'm looking for an output of
Customer_ID Date_Visited visit_times
1 2020-07-10 2
2 2020-07-09 2
1 2020-01-01 1
2 2020-07-08 1
1 2020-07-08 1
3 2020-07-01 1
I've been able to use a solution involving loops - but this would be inefficient for large dataframes.
I've thought about trying to merge two copies of the dataframe and use a similar approach to that in Counting Number of Occurrences Between Dates (Given an ID value) From Another Dataframe but I'm not sure if this method is the best way to approach this problem
Upvotes: 2
Views: 495
Reputation: 260335
You can group by Customer_ID
and year/month (using pandas.Grouper
on the sorted dataframe (pandas.DataFrame.sort_values
using the date column as key
) and apply a cumcount
per group (you need to add 1 as the count starts from 0 in python):
df['visit_times'] = (df.sort_values(by='Date_Visited (Year-Month-Day)')
.groupby(['Customer_ID',
pd.Grouper(freq='M', key='Date_Visited (Year-Month-Day)')
])
.cumcount()+1
)
output:
Customer_ID Date_Visited (Year-Month-Day) visit_times
0 1 2020-07-10 2
1 2 2020-07-09 2
2 1 2020-01-01 1
3 2 2020-07-08 1
4 1 2020-07-08 1
5 3 2020-07-01 1
Upvotes: 2