Create union of two columns in pandas

Question

I have two dataframes with identical columns. However the 'labels' column can have different labels. All labels are comma seperated strings. I want to make a union on the labels in order to go from this:

df1:

    id1   id2 labels language
0   1     1   1      en
1   2     3          en
2   3     4   4      en
3   4     5          en
4   5     6          en

df2:

    id1   id2 labels language
0   1     1   1,2    en
1   2     3          en
2   3     4   5,7    en
3   4     5          en
4   5     6   3      en

to this:

    id1   id2 labels language
0   1     1   1,2    en
1   2     3          en
2   3     4   4,5,7  en
3   4     5          en
4   5     6   3      en

I've tried this:

df1['labels'] = df1['labels'].apply(lambda x: set(str(x).split(',')))
df2['labels'] = df2['labels'].apply(lambda x: set(str(x).split(',')))
result = df1.merge(df2, on=['article_id', 'line_number', 'language'], how='outer')

result['labels'] = result[['labels_x', 'labels_y']].apply(lambda x: list(set.union(*x)) if None not in x else set(), axis=1)
result['labels'] = result['labels'].apply(lambda x: ','.join(set(x)))
result = result.drop(['labels_x', 'techniques_y'], axis=1)

but I get a wierd df with odd commas in some places, e.g the ,3.:

    id1   id2 labels language
0   1     1   1,2    en
1   2     3          en
2   3     4   4,5,7  en
3   4     5          en
4   5     6   ,3     en

How can I properly fix the commas? Any help is appreciated!

Create union of two columns in pandas

Answers (1)

Related Questions