Numbaker
Numbaker

Reputation: 81

Pandas groupby throwing up ValueError

Category    Data    Age     Location
A            30     44      212 Street
A            20     54      212 Street
A            20     48      212 Street
A            10     49      209 Street
A            40     12      209 Street
A            30     21      209 Street
A            30     32      220 Street
A            35     24      220 Street
A            25     22      220 Street

this is part of my pandas df. I wish to group ober category and sum data and keep unique location in list

df.groupby(['Category']).agg({'Location': pd.Series.unique, 'Data': 'sum'})
This is throwing value error for Location.
ValueError: Function does not reduce

What am I doing wrong? I would like to put unique location in groupby column..

Upvotes: 2

Views: 881

Answers (2)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10970

You could use a set data structure for unique elements

df.groupby(['Category']).agg({'Location': set, 'Data': 'sum'})

                                      Location  Data
Category                                            
A         {220 Street, 209 Street, 212 Street}   240

In your code, the issue is that you have pd.Series.unique as the aggregator function. This function does not take any input rather it is a function of the class Series. So, like @jezrael mentioned lambda x: list(x.unique()) works because x is a Series and it is converted to a list because pandas do not accept ndarray as output of the aggregator function.

Upvotes: 2

jezrael
jezrael

Reputation: 863701

You can convert unique values to list, because in your solution and without convert to list get same error:

ValueError: Must produce aggregated value

#your solution should be changed with convert to list
df1 = df.groupby(['Category']).agg({'Location': lambda x: list(pd.Series.unique(x)), 
                                    'Data': 'sum'})

Or:

df1 = df.groupby(['Category']).agg({'Location': lambda x: list(x.unique()), 'Data': 'sum'})
print (df1)
                                      Location  Data
Category                                            
A         [212 Street, 209 Street, 220 Street]   240

But if dont need lists add DataFrame.explode:

df1 = (df.groupby(['Category'])
         .agg({'Location': lambda x: list(x.unique()), 'Data': 'sum'})
         .explode('Location'))
print (df1)
            Location  Data
Category                  
A         212 Street   240
A         209 Street   240
A         220 Street   240

Or use solution with DataFrame.drop_duplicates and DataFrame.join :

s = df.groupby(['Category'])['Data'].sum()
print (s)
Category
A    240
Name: Data, dtype: int64

df1 = df.drop_duplicates(['Category','Location']).drop('Data',axis=1).join(s, on='Category')
print (df1)
  Category  Age    Location  Data
0        A   44  212 Street   240
3        A   49  209 Street   240
6        A   32  220 Street   240

Upvotes: 3

Related Questions