Count of unique values per group as new column with pandas

Question

I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.

df = pd.DataFrame({
         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})

df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

cs95 · Accepted Answer

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
      uID mID  ncount
0   James   A       5
1   Henry   B       2
2     Abe   A       5
3   James   B       2
4   Henry   A       5
5   Brian   A       5
6  Claude   A       5
7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
      uID mID  ncount
0   James   A       5
1   Henry   B       2
2     Abe   A       5
3   James   B       2
4   Henry   A       5
5   Brian   A       5
6  Claude   A       5
7   James   C       1

Count of unique values per group as new column with pandas

Answers (2)

`GroupBy.transform('nunique')`

`GroupBy.nunique` + `pd.Series.map`

Related Questions

Count of unique values per group as new column with pandas

Answers (2)

GroupBy.transform('nunique')

GroupBy.nunique + pd.Series.map

Related Questions

`GroupBy.transform('nunique')`

`GroupBy.nunique` + `pd.Series.map`