Reputation: 559
I have a dataframe that has the information, over time, of the occurences of two events (booking
and search
) for user ID's. It looks like this:
event user_id
booking asdf81
search frjl22
search frjl22
booking frjl22
search asdf81
I'd like to calculate two columns based on this - num_bookings
and num_searches
.
So, the output dataframe would look like this:
event user_id num_bookings num_searches
booking asdf81 1 0
search frjl22 0 1
search frjl22 0 2
booking frjl22 1 2
search asdf81 1 1
How can I achieve this in pandas?
Upvotes: 0
Views: 135
Reputation: 862681
Idea is reshape event
column to columns and use GroupBy.cumcount
per all groups with removing DataFrame.dropna
and then forward filling misisng values by GroupBy.ffill
with replace missing values to 0
and last add to original by DataFrame.join
:
df1 = df.set_index('event', append=True)['user_id'].unstack().add_prefix('num_')
df1 = pd.concat([df1.dropna(subset=[c]).groupby(c).cumcount().add(1)
for c in df1.columns], axis=1, keys=df1.columns)
print (df1)
event num_booking num_search
0 1.0 NaN
1 NaN 1.0
2 NaN 2.0
3 1.0 NaN
4 NaN 1.0
df = df.join(df1.groupby(df['user_id']).ffill().fillna(0).astype(int))
print (df)
event user_id num_booking num_search
0 booking asdf81 1 0
1 search frjl22 0 1
2 search frjl22 0 2
3 booking frjl22 1 2
4 search asdf81 1 1
Upvotes: 2