Reputation: 5473
I am using pandas groupby and want to apply the function to make a set from the items in the group.
The following results in TypeError: 'type' object is not iterable
:
df = df.groupby('col1')['col2'].agg({'size': len, 'set': set})
But the following works:
def to_set(x):
return set(x)
df = df.groupby('col1')['col2'].agg({'size': len, 'set': to_set})
In my understanding the two expression are similar, what is the reason why the first does not work?
Upvotes: 24
Views: 31999
Reputation: 81
Update for Pandas version 1.3.3 if using .agg({'set': set})
produces the following error:
TypeError: Unable to infer the type of the field set
This persists if simply using the previously suggested solution of .agg({'set': lambda x: set(x)})
The reason for this is that set
does not fulfil is_list_like
in _aggregate
(detailed explanation here, courtesy of @EdChum)
A solution is therefore to coerce it to a list using:
.agg({'set': lambda x: list(set(x))})
Upvotes: 0
Reputation: 1450
Update for newer versions of Pandas if you get the following error
SpecificationError: nested renamer is not supported
df = df.groupby('col1')['col2'].agg(size= len, set= lambda x: set(x))
Upvotes: 5
Reputation: 42875
set
, doesn't result in TypeError: 'type' object is not iterable
.
It's because set
is of type
type
whereas to_set
is of type
function
:
type(set)
<class 'type'>
def to_set(x):
return set(x)
type(to_set)
<class 'function'>
According to the docs, .agg()
expects:
arg :
function
ordict
Function to use for aggregating groups.
- If a
function
, must either work when passed aDataFrame
or when passed toDataFrame.apply
.
- If passed a
dict
, the keys must beDataFrame
column names.
Accepted Combinations are:
string
cythonized function namefunction
list
of functions
dict
of columns -> functions
- nested
dict
of names -> dicts of functions
Upvotes: 21
Reputation: 141
Try using:
df = df.groupby('col1')['col2'].agg({'size': len, 'set': lambda x: set(x)})
Works for me.
Upvotes: 12