Reputation: 2998
I have dataframe with two columns: user
and lang
. Each user knows one or more languages:
lang user
0 Python Mike
1 Scala Mike
2 R John
3 Julia Michael
4 Java Michael
I need to get for each row in user
all the languages which he/she knows. I can do that:
df.groupby('user')['lang'].apply(lambda x:', '.join(x)).reset_index()
But I get this:
user lang
0 John R
1 Michael Julia, Java
2 Mike Python, Scala
Instead of what I want:
lang user
0 Python,Scala Mike
1 Python,Scala Mike
2 R John
3 Julia,Java Michael
4 Julia,Java Michael
Code to reproduce:
import pandas as pd
df = pd.DataFrame({"lang":["Python","Scala","R","Julia","Java"],
"user":["Mike","Mike","John","Michael","Michael"]})
print(df)
Upvotes: 1
Views: 41
Reputation: 402814
Use transform
to "broadcast" the groupby
result to each row in the input.
df['lang'] = df.groupby('user')['lang'].transform(', '.join)
df
lang user
0 Python, Scala Mike
1 Python, Scala Mike
2 R John
3 Julia, Java Michael
4 Julia, Java Michael
Upvotes: 4