neuron
neuron

Reputation: 559

Counting the number of occurences of an event per used id

I have a dataframe that has the information, over time, of the occurences of two events (booking and search) for user ID's. It looks like this:

event    user_id
booking  asdf81
search   frjl22
search   frjl22
booking  frjl22
search   asdf81

I'd like to calculate two columns based on this - num_bookings and num_searches.

So, the output dataframe would look like this:

event    user_id    num_bookings   num_searches
booking  asdf81     1              0     
search   frjl22     0              1
search   frjl22     0              2
booking  frjl22     1              2
search   asdf81     1              1

How can I achieve this in pandas?

Upvotes: 0

Views: 135

Answers (1)

jezrael
jezrael

Reputation: 862681

Idea is reshape event column to columns and use GroupBy.cumcount per all groups with removing DataFrame.dropna and then forward filling misisng values by GroupBy.ffill with replace missing values to 0 and last add to original by DataFrame.join:

df1 = df.set_index('event', append=True)['user_id'].unstack().add_prefix('num_')
df1 = pd.concat([df1.dropna(subset=[c]).groupby(c).cumcount().add(1) 
                                             for c in df1.columns], axis=1, keys=df1.columns)
print (df1)
event  num_booking  num_search
0              1.0         NaN
1              NaN         1.0
2              NaN         2.0
3              1.0         NaN
4              NaN         1.0

df = df.join(df1.groupby(df['user_id']).ffill().fillna(0).astype(int))
print (df)
     event user_id  num_booking  num_search
0  booking  asdf81            1           0
1   search  frjl22            0           1
2   search  frjl22            0           2
3  booking  frjl22            1           2
4   search  asdf81            1           1

Upvotes: 2

Related Questions