hangc
hangc

Reputation: 5473

Pandas union on two columns of set

I have two columns in a data frame containing sets.

How do I get a new column where each row contains the union of the items from the respective columns?

For example:

col1 : [{1,2} , {4,5}]
col2 : [{1,6} , {7,5}]
union : [{1,2,6}, {4,5,7}]

A naive try:

df['union'] = df['col1'].apply(lambda x: x.union(df['col2']))

does not work

Upvotes: 2

Views: 14208

Answers (1)

jezrael
jezrael

Reputation: 863281

I think you are very close - use apply with axis=1:

import pandas as pd

df = pd.DataFrame([[{1,2} , {1,6}], [{4,5} , {7,5}]], columns=['col1', 'col2'])

df['union'] = df.apply(lambda x: x['col1'].union(x['col2']), axis=1)
print (df)
     col1    col2      union
0  {1, 2}  {1, 6}  {1, 2, 6}
1  {4, 5}  {5, 7}  {4, 5, 7}

Another solution with | docs:

df['union'] = df.apply(lambda x: (x['col1'] | x['col2']), axis=1)
print (df)
     col1    col2      union
0  {1, 2}  {1, 6}  {1, 2, 6}
1  {4, 5}  {5, 7}  {4, 5, 7}

Upvotes: 4

Related Questions