How to join all unique strings from multiple columns of each group in pandas

Question

Consider the sample df below:

import pandas as pd

d = {'id': ["A123", "A123", "A123"],
     'text1': ["this is a sample", "this is a sample", "this is a sample"],
     'text2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)

I'm trying to take the id column id and concat the unique values of each of the text columns, so that the sample df:

id    text1              text2
A123  this is a sample   sing with me
A123  this is a sample   one two three
A123  this is a sample   sing with me

Will look like this:

id    combined_text
A123  this is a sample | sing with me | one two three

I tried all sort of combination of " | ".join(x) and agg and more... I can take d['id','text1'].unique() and d['id','text2'].unique() and later merge, but there must be a more efficient way.

Nick ODell · Accepted Answer

I'd suggest using stack() and unique() here. I also broke this up by group, using the id column.

import pandas as pd

d = {'id': ["A123", "A123", "A123"],
     'text1': ["this is a sample", "this is a sample", "this is a sample"],
     'text 2': ["sing with me", "one two three", "sing with me"]}
df = pd.DataFrame(data=d)

df = pd.DataFrame(
    df.groupby('id')
      .apply(
          lambda df: ' | '.join(df[['text1', 'text 2']].stack().unique())
      ),
columns=['combined_text'])

How to join all unique strings from multiple columns of each group in pandas

Answers (1)

Related Questions