Reputation: 1109
I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.
df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
df.groupby('mID')['uID'].nunique()
Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
(this obviously does not work)
It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.
Thanks
Upvotes: 5
Views: 5919
Reputation: 402263
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
Upvotes: 6
Reputation: 11105
You are very close!
df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
Upvotes: 3