Counting previous occurences of an ID in a Dataframe within a certain date range

Question

I have a pandas dataframe containing dates of when a customer enters a shop. I'm looking for a method that will allow me to count the number of times a customer has visited a shop in the past month from the current Date_Visited including the current visit.

So, for a minimal dataset below

Customer_ID     Date_Visited (Year-Month-Day)
1               2020-07-10
2               2020-07-09
1               2020-01-01               
2               2020-07-08
1               2020-07-08
3               2020-07-01

I'm looking for an output of

Customer_ID     Date_Visited    visit_times
1               2020-07-10      2
2               2020-07-09      2
1               2020-01-01      1         
2               2020-07-08      1 
1               2020-07-08      1
3               2020-07-01      1

I've been able to use a solution involving loops - but this would be inefficient for large dataframes.

I've thought about trying to merge two copies of the dataframe and use a similar approach to that in Counting Number of Occurrences Between Dates (Given an ID value) From Another Dataframe but I'm not sure if this method is the best way to approach this problem

mozway · Accepted Answer

You can group by Customer_ID and year/month (using pandas.Grouper on the sorted dataframe (pandas.DataFrame.sort_values using the date column as key) and apply a cumcount per group (you need to add 1 as the count starts from 0 in python):

df['visit_times'] = (df.sort_values(by='Date_Visited (Year-Month-Day)')
                       .groupby(['Customer_ID',
                                 pd.Grouper(freq='M', key='Date_Visited (Year-Month-Day)')
                                ])
                       .cumcount()+1
                               )

output:

   Customer_ID Date_Visited (Year-Month-Day)  visit_times
0            1                    2020-07-10            2
1            2                    2020-07-09            2
2            1                    2020-01-01            1
3            2                    2020-07-08            1
4            1                    2020-07-08            1
5            3                    2020-07-01            1

Counting previous occurences of an ID in a Dataframe within a certain date range

Answers (1)

Related Questions