alinaz
alinaz

Reputation: 149

What is the best way to combine 2 string columns in pandas into a new column based on a specific condition?

I have a pandas dataframe with string values in each column. I would like to combine column 1 and column 2 into a new column, let's say column 4. However, if words in columns 1 and 2 are the same, I would like to combine columns 1 and 3 into the new column instead.

I have tried to put pairs in a list first, to put it as a separate column later, however it didn't work out. I'm new to python, so I think I'm missing a much easier solution.

pairs = []
for row in df['interest1']:
    if row == df['interest2'].iloc[row]:
        pairs.append(df['interest1'] + ' ' + df['interest2'])
    else:
        pairs.append(df['interest1'] + ' ' + df['interest3'])
#a simple example of what I would like to achieve

import pandas as pd

lst= [['music','music','film','music film'],
      ['guitar','piano','violin','guitar piano'],
      ['music','photography','photography','music photography'],
     ]

df= pd.DataFrame(lst,columns=['interest1','interest2','interest3','first distinct pair'])
df

Upvotes: 1

Views: 311

Answers (1)

Ayoub ZAROU
Ayoub ZAROU

Reputation: 2417

you could use the where method for pandas dataframes ,

df['first_distinct_pair'] = (df['interest1'] + df['interest2']).where(df['interest1'] != df['interest2'],  df['interest1'] + df['interest3'])

if you want to include spaces , you could do :

df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'],  df['interest1'] + ' ' + df['interest3'])

The result loooks something like :

 import pandas as pd
      ...: 
      ...: lst= [['music','music','film'],
      ...:       ['guitar','piano','violin'],
      ...:       ['music','photography','photography'],
      ...:      ]
      ...: 
      ...: df= pd.DataFrame(lst,columns=['interest1','interest2','interest3'])

>>> df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'],  df['interest1'] + ' ' + df['interest3'])

>>> df
  interest1    interest2    interest3 first_distinct_pair
0     music        music         film          music film
1    guitar        piano       violin        guitar piano
2     music  photography  photography   music photography

Upvotes: 1

Related Questions