Reputation: 81
Category Data Age Location
A 30 44 212 Street
A 20 54 212 Street
A 20 48 212 Street
A 10 49 209 Street
A 40 12 209 Street
A 30 21 209 Street
A 30 32 220 Street
A 35 24 220 Street
A 25 22 220 Street
this is part of my pandas df. I wish to group ober category and sum data and keep unique location in list
df.groupby(['Category']).agg({'Location': pd.Series.unique, 'Data': 'sum'})
This is throwing value error for Location.
ValueError: Function does not reduce
What am I doing wrong? I would like to put unique location in groupby column..
Upvotes: 2
Views: 881
Reputation: 10970
You could use a set
data structure for unique elements
df.groupby(['Category']).agg({'Location': set, 'Data': 'sum'})
Location Data
Category
A {220 Street, 209 Street, 212 Street} 240
In your code, the issue is that you have pd.Series.unique
as the aggregator function. This function does not take any input rather it is a function of the class Series
. So, like @jezrael mentioned lambda x: list(x.unique())
works because x
is a Series
and it is converted to a list
because pandas do not accept ndarray
as output of the aggregator function.
Upvotes: 2
Reputation: 863701
You can convert unique values to list, because in your solution and without convert to list get same error:
ValueError: Must produce aggregated value
#your solution should be changed with convert to list
df1 = df.groupby(['Category']).agg({'Location': lambda x: list(pd.Series.unique(x)),
'Data': 'sum'})
Or:
df1 = df.groupby(['Category']).agg({'Location': lambda x: list(x.unique()), 'Data': 'sum'})
print (df1)
Location Data
Category
A [212 Street, 209 Street, 220 Street] 240
But if dont need lists add DataFrame.explode
:
df1 = (df.groupby(['Category'])
.agg({'Location': lambda x: list(x.unique()), 'Data': 'sum'})
.explode('Location'))
print (df1)
Location Data
Category
A 212 Street 240
A 209 Street 240
A 220 Street 240
Or use solution with DataFrame.drop_duplicates
and DataFrame.join
:
s = df.groupby(['Category'])['Data'].sum()
print (s)
Category
A 240
Name: Data, dtype: int64
df1 = df.drop_duplicates(['Category','Location']).drop('Data',axis=1).join(s, on='Category')
print (df1)
Category Age Location Data
0 A 44 212 Street 240
3 A 49 209 Street 240
6 A 32 220 Street 240
Upvotes: 3