Noah Watkins
Noah Watkins

Reputation: 5540

Compute unique groups from Pandas group-by results

I'd like to count the unique groups from the result of a Pandas group-by operation. For instance here is an example data frame.

In [98]: df = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [10,10,11,10,10,15]})                                                                        

In [99]: df.groupby('A').groups
Out[99]: {1: [0, 3], 2: [1, 4], 3: [2, 5]}

The conceptual groups are {1: [10, 10], 2: [10, 10], 3: [11, 15]} where the index locations in the groups above are substituded with the values from column B, but the first problem I've run into is how to convert those positions (e.g. [0, 3]) into values from the B column.

Given the ability to convert the groups into the value groups from column B I can compute the unique groups by hand, but a secondary question here is if Pandas has a built-in routine for this, which I haven't seen.

Edit updated with target output:

This is the output I would be looking for in the simplest case:

{1: [10, 10], 2: [10, 10], 3: [11, 15]}

And counting the unique groups would produce something equivalent to:

{[10, 10]: 2, [11, 15]: 1}

Upvotes: 1

Views: 128

Answers (2)

DSM
DSM

Reputation: 353059

How about:

>>> df = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [10,10,11,10,10,15]})
>>> df.groupby("A")["B"].apply(tuple).value_counts()
(10, 10)    2
(11, 15)    1
dtype: int64

or maybe

>>> df.groupby("A")["B"].apply(lambda x: tuple(sorted(x))).value_counts()
(10, 10)    2
(11, 15)    1
dtype: int64

if you don't care about the order within the group.

You can trivially call .to_dict() if you'd like, e.g.

>>> df.groupby("A")["B"].apply(tuple).value_counts().to_dict()
{(11, 15): 1, (10, 10): 2}

Upvotes: 2

behzad.nouri
behzad.nouri

Reputation: 77951

maybe:

>>> df.groupby('A')['B'].aggregate(lambda ts: list(ts.values)).to_dict()
{1: [10, 10], 2: [10, 10], 3: [11, 15]}

for counting the groups you need to convert to tuple because lists are not hashable:

>>> ts = df.groupby('A')['B'].aggregate(lambda ts: tuple(ts.values))
>>> ts.value_counts().to_dict()
{(11, 15): 1, (10, 10): 2}

Upvotes: 1

Related Questions