Reputation: 326
Consider the sample df
below:
import pandas as pd
d = {'id': ["A123", "A123", "A123"],
'text1': ["this is a sample", "this is a sample", "this is a sample"],
'text2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)
I'm trying to take the id column id
and concat the unique
values of each of the text columns, so that the sample df:
id text1 text2
A123 this is a sample sing with me
A123 this is a sample one two three
A123 this is a sample sing with me
Will look like this:
id combined_text
A123 this is a sample | sing with me | one two three
I tried all sort of combination of " | ".join(x)
and agg
and more... I can take d['id','text1'].unique()
and d['id','text2'].unique()
and later merge, but there must be a more efficient way.
Upvotes: 0
Views: 1404
Reputation: 25354
I'd suggest using stack() and unique() here. I also broke this up by group, using the id column.
import pandas as pd
d = {'id': ["A123", "A123", "A123"],
'text1': ["this is a sample", "this is a sample", "this is a sample"],
'text 2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)
df = pd.DataFrame(
df.groupby('id')
.apply(
lambda df: ' | '.join(df[['text1', 'text 2']].stack().unique())
),
columns=['combined_text'])
Upvotes: 1