Reputation: 13062
I have a dataframe which has multiple rows for each particular user_id
and I'm trying to create a new dataframe that aggregates the values for each user_id
. For example, the original dataframe df
is
id user_id eventdate col1 col2
1 100 2015-10-01 2 A
2 200 2015-11-01 4 B
3 300 2015-12-01 1 A
4 100 2016-02-01 3 B
5 300 2016-03-01 6 C
From this I want a new dataframe df_agg
for each user like this:
user_id col1_sum col2_latest
100 5 B
200 4 B
300 7 C
I got the col1_sum
by doing a groupby like this:
df_agg = pd.DataFrame({'col1_sum': df.groupby('NAME')['col1'].sum()})
How do I add the col2_latest
column to this dataframe?
Upvotes: 1
Views: 48
Reputation: 862851
You need aggregate
by sum
and last
:
df_agg = df.groupby('user_id').agg({'col1':'sum','col2':'last'})
df_agg = df_agg[['col1','col2']].rename(columns={'col1':'col1_sum','col2':'col2_latest'})
.reset_index()
print (df_agg)
user_id col1_sum col2_latest
0 100 5 B
1 200 4 B
2 300 7 C
Upvotes: 1