Reputation: 4432
I have dataframe
ID time coeff category
111 12 1.5 shop
111 15 1.5 shop
222 12 0.8 shop
222 18 0.8 shop
I need to get
category unique_users
shop 2.3
If I try
result = df.groupby(['category']).agg({'ID': pd.Series.nunique}).rename(columns={ 'member_id': 'unique_users'}).reset_index()
I get 2
, because it returns unique of df.ID
. How can I sum values in df.coeff
for unique df.ID
?
Upvotes: 5
Views: 8777
Reputation: 323226
By using drop_duplicates
df.drop_duplicates(['ID','category']).groupby('category').agg({'coeff':'sum'})
Out[763]:
coeff
category
shop 2.3
Upvotes: 4
Reputation: 12515
This ought to do what you want:
df.drop_duplicates(subset=['ID']).groupby(['category']).agg(sum).drop(['ID', 'time'], axis=1)
Out[133]:
coeff
category
shop 2.3
Upvotes: 1
Reputation: 76917
Here's one way
In [2314]: (df.groupby('category')
.apply(lambda x: x.groupby('ID').coeff.first().sum())
.reset_index(name='unique_users'))
Out[2314]:
category unique_users
0 shop 2.3
Upvotes: 9