Reputation: 95
I have a dataframe which looks like:
|--------------|------------|------------|
| User | Text | Effort |
|--------------|------------|------------|
| user122 | TextA | 2 Weeks |
|--------------|------------|------------|
| user124 | TextB | 2 Weeks |
|--------------|------------|------------|
| user125 | TextC | 3 Weeks |
|--------------|------------|------------|
| user126 | TextD | 2 Weeks |
|--------------|------------|------------|
| user126 | TextE | 2 Weeks |
|--------------|------------|------------|
My goal is to group the table by Effort and get the unique count of each user per group. I am able to do that by:
df.groupby(['Effort']).agg({"User": pd.Series.nunique})
And this results into this table:
|--------------|------------|
| Effort | User |
|--------------|------------|
| 2 weeks | 3 |
|--------------|------------|
| 3 weeks | 1 |
|--------------|------------|
However, by doing so I am loosing my text column information. Another solution I tried is to keep the first occurrence of that column, but I am still unhappy because I loose something on the way.
Question Is there any way in which I can keep my initial dataframe without loosing any row and column but at the same time still group by Effort?
Upvotes: 0
Views: 1287
Reputation: 42886
The best option you have is using transform
if you ask me. This way you keep the shape of your original data, but still get the results of a groupby.
df['Nunique'] = df.groupby('Effort')['User'].transform('nunique')
User Text Effort Nunique
0 user122 TextA 2 Weeks 3
1 user124 TextB 2 Weeks 3
2 user125 TextC 3 Weeks 1
3 user126 TextD 2 Weeks 3
4 user126 TextE 2 Weeks 3
Upvotes: 4