Petr Petrov
Petr Petrov

Reputation: 4432

Pandas: sum values from column to unique values

I have dataframe

ID    time    coeff   category
111    12      1.5     shop
111    15      1.5       shop
222    12      0.8     shop
222    18      0.8     shop

I need to get

category    unique_users
 shop         2.3

If I try

result = df.groupby(['category']).agg({'ID': pd.Series.nunique}).rename(columns={ 'member_id': 'unique_users'}).reset_index()

I get 2, because it returns unique of df.ID. How can I sum values in df.coeff for unique df.ID?

Upvotes: 5

Views: 8777

Answers (3)

BENY
BENY

Reputation: 323226

By using drop_duplicates

df.drop_duplicates(['ID','category']).groupby('category').agg({'coeff':'sum'})
    Out[763]: 
              coeff
    category       
    shop        2.3

Upvotes: 4

boot-scootin
boot-scootin

Reputation: 12515

This ought to do what you want:

df.drop_duplicates(subset=['ID']).groupby(['category']).agg(sum).drop(['ID', 'time'], axis=1)

Out[133]: 
          coeff
category       
shop        2.3

Upvotes: 1

Zero
Zero

Reputation: 76917

Here's one way

In [2314]: (df.groupby('category')
              .apply(lambda x: x.groupby('ID').coeff.first().sum())
              .reset_index(name='unique_users'))
Out[2314]:
  category  unique_users
0     shop           2.3

Upvotes: 9

Related Questions