Reputation: 7536
Inspired by this post, I would like to get a distinct count of a value in a data frame per a grouping and create a column with the distinct count values in the data frame. Like this:
Original data frame:
import pandas as pd
df = pd.DataFrame(
{'A' : ['foo', 'foo', 'foo', 'foo',
'bar', 'bar', 'bar', 'bar'],
'B' : ['foo', 'fo', 'foo', 'foo',
'bar', 'bar', 'ba', 'ba'],
'C' : [2, 4, 4, 2, 5, 4, 3, 2]})
df
A B C
0 foo foo 2
1 foo fo 4
2 foo foo 4
3 foo foo 2
4 bar bar 5
5 bar bar 4
6 bar ba 3
7 bar ba 2
Method from linked post applied:
df=df.groupby(['A','B'])['C'].apply(lambda x: len(x.unique()))
df
Result per linked post method:
A B
bar ba 2
bar 2
foo fo 1
foo 2
Name: C, dtype: int64
Desired result:
A B C Distinct Count of C per A and B
0 foo foo 2 2
1 foo fo 4 1
2 foo foo 4 2
3 foo foo 2 2
4 bar bar 5 2
5 bar bar 4 2
6 bar ba 3 2
7 bar ba 2 2
Looking at the first row, the combination of "foo" in "A" and "foo" in "B" has 2 unique values associated with it (2 and 4), resulting in a 2 in each row for that combination of values for columns A and B.
Thank in advance!
Upvotes: 2
Views: 615
Reputation: 31682
Use transform
instead of apply
because it return column with the same size as original, I couldn't find documentation on the original pandas site for that, but from help:
transform(func, *args, **kwargs) method of pandas.core.groupby.SeriesGroupBy instance
Call function producing a like-indexed Series on each group and return a Series with the transformed values
df['Distinct Count of C per A and B'] = df.groupby(['A','B'])['C'].transform(lambda x: len(x.unique()))
In [1495]: df
Out[1495]:
A B C Distinct Count of C per A and B
0 foo foo 2 2
1 foo fo 4 1
2 foo foo 4 2
3 foo foo 2 2
4 bar bar 5 2
5 bar bar 4 2
6 bar ba 3 2
7 bar ba 2 2
Upvotes: 1