Reputation: 925
In Pandas, there is a very clean way to count the distinct values in a column within a group by operation. For example
ex = pd.DataFrame([[1, 2, 3], [6, 7, 8], [1, 7, 9]],
columns=["A", "B", "C"]).set_index(["A", "B"])
ex.groupby(level="A").C.nunique()
will return
A
1 2
6 1
Name: C, dtype: int64
I would also like to count the distinct values in index level B
while grouping by A
. I can't find a clean way to access the levels of B
from the groupby
object. The best I've been able to come up with is:
ex.reset_index("B", drop=False).groupby(level="A").B.nunique()
which correctly returns:
A
1 2
6 1
Name: B, dtype: int64
Is there a way for me to do this on the groupby without resetting the index or using an apply
function?
Upvotes: 8
Views: 16249
Reputation: 28143
For your amusement, a not-so-easy-to-read-out-loud solution that does not use reset_index
, or apply
, or agg
, or anonymous functions. However, it does use zip
and Counter
from the standard library.
import pandas as pd
from collections import Counter
ex = pd.DataFrame([[1, 2, 3], [6, 7, 8], [1, 7, 9]],
columns=["A", "B", "C"]).set_index(["A", "B"])
A_val, nunique_B = zip(*[(k, len(Counter(v.index.labels[v.index.names.index('B')])))
for k, v in ex.groupby(level='A')])
pd.Series(nunique_B, index=pd.Int64Index(A_val, name='A'))
returns
A
1 2
6 1
dtype: int32
Also, for generality I do not assume that B
is at level 1 of the index.
Upvotes: 0
Reputation: 28956
Not sure if this is any better, but it doesn't use an apply or reset index :)
In [20]: ex.groupby(level="A").agg(lambda x: x.index.get_level_values(1).nunique())
Out[20]:
C
A
1 2
6 1
FWIW, I find it useful to break these apart when developing a complicated groupby. You can view the individual objects you'll work with by
In [24]: ex.groupby(level="A").get_group(1)
Out[24]:
C
A B
1 2 3
7 9
Together:
In [33]: (ex.groupby(level='A')
....: .C.agg({'a': lambda x: x.index.get_level_values(1).nunique(),
....: 'b': 'nunique'}))
Out[33]:
b a
A
1 2 2
6 1 1
Upvotes: 4
Reputation: 31692
IIUC you could do reset_index
for all levels, then groupby be 'A' and apply nunique
method:
res = ex.reset_index().groupby('A').agg(lambda x: x.nunique())
In [339]: res
Out[339]:
B C
A
1 2 2
6 1 1
Same solution with pivot_table
:
In [341]: ex.reset_index().pivot_table(index='A', aggfunc=lambda x: x.nunique())
Out[341]:
B C
A
1 2 2
6 1 1
Upvotes: 6