Reputation: 3
I'm trying to create a columns called 'city_code' with values from the 'code' column. But in order to do this I need to compare if 'ds_city' and 'city' values are equal.
Here is a table sample:
https://i.sstatic.net/LZ3gC.png
I've tried this:
def find_code(data):
if data['ds_city'] == data['city'] :
return data['code']
else:
return 'UNKNOWN'
df['code_city'] = df.apply(find_code, axis=1)
But since there are duplicates in the 'ds_city' columns that's the result:
https://i.sstatic.net/SxYfi.png
Here is a image of the expected result:
https://i.sstatic.net/W4D2E.png
How can I work around this?
Upvotes: 0
Views: 65
Reputation: 872
You can use pandas merge:
df = pd.merge(df, df[['code', 'city']], how='left',
left_on='ds_city', right_on='city',
suffixes=('', '_right')).drop(columns='city_right')
# output:
# code city ds_city code_right
# 0 1500107 ABAETETUBA ABAETETUBA 1500107
# 1 2900207 ABARE ABAETETUBA 1500107
# 2 2100055 ACAILANDIA ABAETETUBA 1500107
# 3 2300309 ACOPIARA ABAETETUBA 1500107
# 4 5200134 ACREUNA ABARE 2900207
Here's pandas.merge's documentation. It takes the input dataframe and left joins itself's code
and city
columns when ds_city
equals city
.
The above code will fill code_right
when city
is not found with nan
. You can further do the following to fill it with 'UNKNOWN':
df['code_right'] = df['code_right'].fillna('UNKNOWN')
Upvotes: 2
Reputation: 71
You could try this out:
# Begin with a column of only 'UNKNOWN' values.
data['code_city'] = "UNKNOWN"
# Iterate through the cities in the ds_city column.
for i, lookup_city in enumerate(data['ds_city']):
# Note the row which contains the corresponding city name in the city column.
row = data['city'].tolist().index(lookup_city)
# Reassign the current row's code_city column to that code from the row we found in the last step.
data['code_city'][i] = data['code'][row]
Upvotes: 0
Reputation: 323396
This is more like np.where
import numpy as np
df['code_city'] = np.where(data['ds_city'] == data['city'],data['code'],'UNKNOWN')
Upvotes: 0