Reputation: 149
I have a pandas dataframe with string values in each column. I would like to combine column 1 and column 2 into a new column, let's say column 4. However, if words in columns 1 and 2 are the same, I would like to combine columns 1 and 3 into the new column instead.
I have tried to put pairs in a list first, to put it as a separate column later, however it didn't work out. I'm new to python, so I think I'm missing a much easier solution.
pairs = []
for row in df['interest1']:
if row == df['interest2'].iloc[row]:
pairs.append(df['interest1'] + ' ' + df['interest2'])
else:
pairs.append(df['interest1'] + ' ' + df['interest3'])
#a simple example of what I would like to achieve
import pandas as pd
lst= [['music','music','film','music film'],
['guitar','piano','violin','guitar piano'],
['music','photography','photography','music photography'],
]
df= pd.DataFrame(lst,columns=['interest1','interest2','interest3','first distinct pair'])
df
Upvotes: 1
Views: 311
Reputation: 2417
you could use the where
method for pandas dataframes ,
df['first_distinct_pair'] = (df['interest1'] + df['interest2']).where(df['interest1'] != df['interest2'], df['interest1'] + df['interest3'])
if you want to include spaces , you could do :
df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'], df['interest1'] + ' ' + df['interest3'])
The result loooks something like :
import pandas as pd
...:
...: lst= [['music','music','film'],
...: ['guitar','piano','violin'],
...: ['music','photography','photography'],
...: ]
...:
...: df= pd.DataFrame(lst,columns=['interest1','interest2','interest3'])
>>> df['first_distinct_pair'] = (df['interest1'] + ' '+ df['interest2']).where(df['interest1'] != df['interest2'], df['interest1'] + ' ' + df['interest3'])
>>> df
interest1 interest2 interest3 first_distinct_pair
0 music music film music film
1 guitar piano violin guitar piano
2 music photography photography music photography
Upvotes: 1