Reputation: 4807

Concat two columns and keep unique values if repeating

I have a dataframe as follows:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, 'dessert', None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff           type
0   apple-martini   None
1   apple-pie       None
2   None            strawberry-tart
3   dessert         dessert
4   None            None

I want to achieve the following:

df
Out[10]:
Combined
0   apple-martini   
1   apple-pie       
2   strawberry-tart
3   dessert         
4   None

The solution here tackles the case of combining the columns when one of the columns is definitely None. I am trying to achieve the case that if there is repeat of values in a row for the two columns, then only one value is retained.

Upvotes: 0

Answers (3)

Chidi

Reputation: 991

if you need to handle cases where some of the column values is '' and take the union of column values where there are not equal

test_df = pd.DataFrame({'col_1':['apple-martini', 'apple-pie', None, 'dessert', None, '', 'brown'], 'col_2':[None, None, 'strawberry-tart', 'dessert', None, 'cupcake', 'rice']})
test_df.fillna('', inplace=True)
test_df['col_1'] = test_df.apply(lambda x: x['col_1'].split(), axis=1)
test_df['col_2'] = test_df.apply(lambda x: x['col_2'].split(), axis=1)
test_df['set'] = test_df.apply(lambda x: set(x['col_1'] + x['col_2']), axis=1)
test_df['combined'] = test_df.apply(lambda x: ''.join(sorted(x['set'])), axis=1)

print(test_df['combined'])
#result
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
5            cupcake
6          brownrice

Upvotes: 0

Corralien

Reputation: 120399

You can use combine_first:

df['combined'] = df['foodstuff'].combine_first(df['type'])
print(df)

# Output:
       foodstuff             type         combined
0  apple-martini             None    apple-martini
1      apple-pie             None        apple-pie
2           None  strawberry-tart  strawberry-tart
3        dessert          dessert          dessert
4           None             None             None

Upvotes: 2

BENY

Reputation: 323226

We can just do fillna

df['combine'] = df.foodstuff.fillna(df.type)
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None
Name: foodstuff, dtype: object

Upvotes: 1

Concat two columns and keep unique values if repeating

Answers (3)

Related Questions