Reputation: 7313
I want to know how many unique groups I have to perform calculations.
Given a groupby object called dfgroup
, how do we find the number of groups?
Upvotes: 97
Views: 83202
Reputation: 402814
ngroups
Newer versions of the groupby API (pandas >= 0.23) provide this (undocumented) attribute which stores the number of groups in a GroupBy object.
# setup
df = pd.DataFrame({'A': list('aabbcccd')})
dfg = df.groupby('A')
# call `.ngroups` on the GroupBy object
dfg.ngroups
# 4
Note that this is different from GroupBy.groups
which returns the actual groups themselves.
len
?As noted in BrenBarn's answer, you could use len(dfg)
to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__
(which is what len()
calls interally), we see that __len__
makes a call to GroupBy.groups
, which returns a dictionary of grouped indices:
dfg.groups
{'a': Int64Index([0, 1], dtype='int64'),
'b': Int64Index([2, 3], dtype='int64'),
'c': Int64Index([4, 5, 6], dtype='int64'),
'd': Int64Index([7], dtype='int64')}
Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroups
on the other hand is a stored property that can be accessed in constant time.
This has been documented in GroupBy
object attributes. The issue with len
, however, is that for a GroupBy object with a lot of groups, this can take a lot longer
You're in luck. We have a function for that, it's called GroupBy.size
. But please note that size
counts NaNs as well. If you don't want NaNs counted, use GroupBy.count
instead.
Upvotes: 129
Reputation: 251438
As documented, you can get the number of groups with len(dfgroup)
.
Upvotes: 67