Reputation: 151
I have a dataframe with multilabels as below:
text label
0 apple a
1 apple b
2 orange o
3 orange o
4 grapes o
5 grapes g
Expected output:
text label
0 apple ab
1 orange o
2 orange o
3 grapes og
I tried df.groupby('label')['text']=='apple' but fail. I tried df[(df.label=='a') & (df.label=='b')] which also fail. How to select and rename the labels?
Upvotes: 1
Views: 1041
Reputation: 2583
Use groupby and agg
dfagg = df.groupby('text',as_index=False).agg({'label':lambda x: x.nunique()})[lambda x:x['label']>1]
pd.concat([df[df['text'].isin(dfagg.text)].groupby('text',as_index=False).agg({'label':'sum'}),df[~df['text'].isin(dfagg.text)]])
Upvotes: 0
Reputation: 16147
df['label'] = df.groupby('text')['label'].transform(lambda x: ''.join(x) if len(set(x))>1 else x)
df.loc[(df.groupby('text').cumcount()==0) | (df.label.str.len()==1)]
Output
text label
0 apple ab
2 orange o
3 orange o
4 grapes og
Upvotes: 2