sinG20
sinG20

Reputation: 151

Combine 2 labels into single label in pandas

I have a dataframe with multilabels as below:

 text   label
0   apple   a
1   apple   b
2   orange  o
3   orange  o
4   grapes  o
5   grapes  g

Expected output:

text    label
0   apple   ab
1   orange  o
2   orange  o
3   grapes  og

I tried df.groupby('label')['text']=='apple' but fail. I tried df[(df.label=='a') & (df.label=='b')] which also fail. How to select and rename the labels?

Upvotes: 1

Views: 1041

Answers (2)

Mehdi Golzadeh
Mehdi Golzadeh

Reputation: 2583

Use groupby and agg

dfagg = df.groupby('text',as_index=False).agg({'label':lambda x: x.nunique()})[lambda x:x['label']>1]
pd.concat([df[df['text'].isin(dfagg.text)].groupby('text',as_index=False).agg({'label':'sum'}),df[~df['text'].isin(dfagg.text)]])

Upvotes: 0

Chris
Chris

Reputation: 16147

df['label'] = df.groupby('text')['label'].transform(lambda x: ''.join(x) if len(set(x))>1 else x)
df.loc[(df.groupby('text').cumcount()==0) | (df.label.str.len()==1)]

Output

    text    label
0   apple   ab
2   orange  o
3   orange  o
4   grapes  og

Upvotes: 2

Related Questions