Counting the number of occurences of an event per used id

Question

I have a dataframe that has the information, over time, of the occurences of two events (booking and search) for user ID's. It looks like this:

event    user_id
booking  asdf81
search   frjl22
search   frjl22
booking  frjl22
search   asdf81

I'd like to calculate two columns based on this - num_bookings and num_searches.

So, the output dataframe would look like this:

event    user_id    num_bookings   num_searches
booking  asdf81     1              0     
search   frjl22     0              1
search   frjl22     0              2
booking  frjl22     1              2
search   asdf81     1              1

How can I achieve this in pandas?

jezrael · Accepted Answer

Idea is reshape event column to columns and use GroupBy.cumcount per all groups with removing DataFrame.dropna and then forward filling misisng values by GroupBy.ffill with replace missing values to 0 and last add to original by DataFrame.join:

df1 = df.set_index('event', append=True)['user_id'].unstack().add_prefix('num_')
df1 = pd.concat([df1.dropna(subset=[c]).groupby(c).cumcount().add(1) 
                                             for c in df1.columns], axis=1, keys=df1.columns)
print (df1)
event  num_booking  num_search
0              1.0         NaN
1              NaN         1.0
2              NaN         2.0
3              1.0         NaN
4              NaN         1.0

df = df.join(df1.groupby(df['user_id']).ffill().fillna(0).astype(int))
print (df)
     event user_id  num_booking  num_search
0  booking  asdf81            1           0
1   search  frjl22            0           1
2   search  frjl22            0           2
3  booking  frjl22            1           2
4   search  asdf81            1           1

Counting the number of occurences of an event per used id

Answers (1)

Related Questions