Dmitriy Kisil
Dmitriy Kisil

Reputation: 2998

Join strings in each group and assign back to the original DataFrame

I have dataframe with two columns: user and lang. Each user knows one or more languages:

     lang     user
0  Python     Mike
1   Scala     Mike
2       R     John
3   Julia  Michael
4    Java  Michael

I need to get for each row in user all the languages which he/she knows. I can do that:

df.groupby('user')['lang'].apply(lambda x:', '.join(x)).reset_index()

But I get this:

      user           lang
0     John              R
1  Michael    Julia, Java
2     Mike  Python, Scala

Instead of what I want:

           lang     user
0  Python,Scala     Mike
1  Python,Scala     Mike
2             R     John
3    Julia,Java  Michael
4    Julia,Java  Michael

Code to reproduce:

import pandas as pd

df = pd.DataFrame({"lang":["Python","Scala","R","Julia","Java"],
                   "user":["Mike","Mike","John","Michael","Michael"]})
print(df)

Upvotes: 1

Views: 41

Answers (1)

cs95
cs95

Reputation: 402814

Use transform to "broadcast" the groupby result to each row in the input.

df['lang'] = df.groupby('user')['lang'].transform(', '.join)
df
            lang     user
0  Python, Scala     Mike
1  Python, Scala     Mike
2              R     John
3    Julia, Java  Michael
4    Julia, Java  Michael

Upvotes: 4

Related Questions