Mighty
Mighty

Reputation: 447

Create a new column based on condition applied from two other string columns in python

I have data in the below format :

pastLocation | currentLocation    
delhi        | bangalore          
delhi        | london,pune,delhi  
mumbai       | mumbai             
pune         | pune, noida       

I have to create a new column named as changeInLocation where if pastLocation is present in currentLocation then value of new column would be 0 else 1. For example, in second row, pastLocation i.e. Delhi is present in corresponding currentLocation so value of changeInLocation should be 0

Output should be in following format:

pastLocation | currentLocation   | changeInLocation
delhi        | bangalore         | 1
delhi        | london,pune,delhi | 0
mumbai       | mumbai            | 0
pune         | pune, noida       | 0

Upvotes: 4

Views: 97

Answers (3)

Joe
Joe

Reputation: 12417

Similar solution of jezrael(which is anyway more complete), but without casting:

df['changeInLocation']=df.apply(lambda x: 1 if x['pastLocation'] in x['currentLocation'] else 0, axis=1)

Upvotes: 2

jpp
jpp

Reputation: 164843

Similar to jezrael's solution, but taking care to remove whitespace and use set for performance:

import pandas as pd

df = pd.DataFrame({'pastLocation': ['delhi', 'delhi', 'mumbai', 'pune'],
                   'currentLocation': ['bangalore', 'london,pune,delhi',
                                       'mumbai', 'pune, noida']})

sets = [{i.strip() for i in row} for row in df['currentLocation'].str.split(',').values]

df['changeInLocation'] = [int(past not in current) for past, current in \
                          zip(df['pastLocation'], sets)]

print(df)

     currentLocation pastLocation  changeInLocation
0          bangalore        delhi                 1
1  london,pune,delhi        delhi                 0
2             mumbai       mumbai                 0
3        pune, noida         pune                 0

Upvotes: 2

jezrael
jezrael

Reputation: 863741

Use apply with in for check membership and then cast to int:

df['changeInLocation'] = df.apply(lambda x: x['pastLocation'] not in x['currentLocation'], axis=1).astype(int)

Another solution iz zip columns and use list comprehension:

df['changeInLocation'] = [int(a not in b) for a, b in zip(df['pastLocation'], df['currentLocation'])]

print (df)
  pastLocation    currentLocation  changeInLocation
0        delhi          bangalore                 1
1        delhi  london,pune,delhi                 0
2       mumbai             mumbai                 0
3         pune        pune, noida                 0

Upvotes: 4

Related Questions