Reputation: 5540
I'd like to count the unique groups from the result of a Pandas group-by operation. For instance here is an example data frame.
In [98]: df = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [10,10,11,10,10,15]})
In [99]: df.groupby('A').groups
Out[99]: {1: [0, 3], 2: [1, 4], 3: [2, 5]}
The conceptual groups are {1: [10, 10], 2: [10, 10], 3: [11, 15]}
where the index locations in the groups above are substituded with the values from column B, but the first problem I've run into is how to convert those positions (e.g. [0, 3]
) into values from the B
column.
Given the ability to convert the groups into the value groups from column B I can compute the unique groups by hand, but a secondary question here is if Pandas has a built-in routine for this, which I haven't seen.
Edit updated with target output:
This is the output I would be looking for in the simplest case:
{1: [10, 10], 2: [10, 10], 3: [11, 15]}
And counting the unique groups would produce something equivalent to:
{[10, 10]: 2, [11, 15]: 1}
Upvotes: 1
Views: 128
Reputation: 353059
How about:
>>> df = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [10,10,11,10,10,15]})
>>> df.groupby("A")["B"].apply(tuple).value_counts()
(10, 10) 2
(11, 15) 1
dtype: int64
or maybe
>>> df.groupby("A")["B"].apply(lambda x: tuple(sorted(x))).value_counts()
(10, 10) 2
(11, 15) 1
dtype: int64
if you don't care about the order within the group.
You can trivially call .to_dict()
if you'd like, e.g.
>>> df.groupby("A")["B"].apply(tuple).value_counts().to_dict()
{(11, 15): 1, (10, 10): 2}
Upvotes: 2
Reputation: 77951
maybe:
>>> df.groupby('A')['B'].aggregate(lambda ts: list(ts.values)).to_dict()
{1: [10, 10], 2: [10, 10], 3: [11, 15]}
for counting the groups you need to convert to tuple
because lists are not hashable:
>>> ts = df.groupby('A')['B'].aggregate(lambda ts: tuple(ts.values))
>>> ts.value_counts().to_dict()
{(11, 15): 1, (10, 10): 2}
Upvotes: 1