Pandas Groupby but keep all the dataframe

Question

I have a dataframe which looks like:

|--------------|------------|------------|
|     User     |    Text    |   Effort   |
|--------------|------------|------------|
|    user122   |   TextA    |  2 Weeks   |
|--------------|------------|------------|
|    user124   |   TextB    |  2 Weeks   |
|--------------|------------|------------|
|    user125   |   TextC    |  3 Weeks   |
|--------------|------------|------------|
|    user126   |   TextD    |  2 Weeks   |
|--------------|------------|------------|
|    user126   |   TextE    |  2 Weeks   |
|--------------|------------|------------|

My goal is to group the table by Effort and get the unique count of each user per group. I am able to do that by:

df.groupby(['Effort']).agg({"User": pd.Series.nunique})

And this results into this table:

|--------------|------------|
|     Effort   |    User    |
|--------------|------------|
|    2 weeks   |     3      |
|--------------|------------|
|    3 weeks   |     1      |
|--------------|------------|

However, by doing so I am loosing my text column information. Another solution I tried is to keep the first occurrence of that column, but I am still unhappy because I loose something on the way.

Question Is there any way in which I can keep my initial dataframe without loosing any row and column but at the same time still group by Effort?

Erfan · Accepted Answer

The best option you have is using transform if you ask me. This way you keep the shape of your original data, but still get the results of a groupby.

df['Nunique'] = df.groupby('Effort')['User'].transform('nunique')

      User   Text   Effort  Nunique
0  user122  TextA  2 Weeks        3
1  user124  TextB  2 Weeks        3
2  user125  TextC  3 Weeks        1
3  user126  TextD  2 Weeks        3
4  user126  TextE  2 Weeks        3

Pandas Groupby but keep all the dataframe

Answers (1)

Related Questions