add-semi-colons
add-semi-colons

Reputation: 18800

Pandas keeping other columns after GroupBy - Not a sum

Following data set represent a user's browsing behavior.

  user_id session_id        keyword        real_time_stamp           presented    clicked
0  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  101           101
1  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  102           None
2  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  103           None
3  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  104           None
4  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  105           None
5  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  106           None
6  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  107           None
7  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  108           None
8  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  109           None
9  10010  s2342009n           camera 2020-03-01 05:00:19.195000+00:00  110           None

Presented column shows what items were presented to the user and and from the presented items which one was clicked. In above example 1st items was clicked. Goal is to bring this information into one row but also keep the time stamp.

By doing following group_by gives the structure but its without real_time_stamp and clicked. Once you ad real_time_stamp to the group by you don't get the "aggregated" version.

df_collapse = df.groupby(['user_id', 'session_id', 'keyword'])['presented'].apply(lambda x: '|'.join(x)).reset_index()

My attempt is to get following structure:

   user_id session_id        keyword        real_time_stamp                 presented                               clicked
     10010  s2342009n       camera     2020-03-01 05:00:19.195000+00:00  101|102|103|104|105|106|107|108|109|110      101

Upvotes: 0

Views: 31

Answers (1)

BENY
BENY

Reputation: 323226

Check with transform

s=df.groupby(['user_id', 'session_id', 'keyword'])['presented'].transform(lambda x: '|'.join(x.astype(str)))
df['New']=s

if agg to one

df.groupby(['user_id', 'session_id', 'keyword']).\
    agg({'presented':'|'.join,
         'real_time_stamp':'first',
         'clicked':'first'})

Upvotes: 2

Related Questions