Reputation: 18800
Following data set represent a user's browsing behavior.
user_id session_id keyword real_time_stamp presented clicked
0 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 101 101
1 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 102 None
2 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 103 None
3 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 104 None
4 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 105 None
5 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 106 None
6 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 107 None
7 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 108 None
8 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 109 None
9 10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 110 None
Presented column shows what items were presented to the user and and from the presented items which one was clicked. In above example 1st items was clicked. Goal is to bring this information into one row but also keep the time stamp.
By doing following group_by
gives the structure but its without real_time_stamp
and clicked
. Once you ad real_time_stamp
to the group by you don't get the "aggregated" version.
df_collapse = df.groupby(['user_id', 'session_id', 'keyword'])['presented'].apply(lambda x: '|'.join(x)).reset_index()
My attempt is to get following structure:
user_id session_id keyword real_time_stamp presented clicked
10010 s2342009n camera 2020-03-01 05:00:19.195000+00:00 101|102|103|104|105|106|107|108|109|110 101
Upvotes: 0
Views: 31
Reputation: 323226
Check with transform
s=df.groupby(['user_id', 'session_id', 'keyword'])['presented'].transform(lambda x: '|'.join(x.astype(str)))
df['New']=s
if agg to one
df.groupby(['user_id', 'session_id', 'keyword']).\
agg({'presented':'|'.join,
'real_time_stamp':'first',
'clicked':'first'})
Upvotes: 2