Ankit S Kaushik
Ankit S Kaushik

Reputation: 104

Using pd.merge to map values for multiple columns in a dataframe from another dataframe

I have a dataframe(df3)

df3 = pd.DataFrame({
    'Origin':['DEL','BOM','AMD'],
    'Destination':['BOM','AMD','DEL']})

comprising of Travel Data which contains Origin/Destination and I'm trying to map Latitude and Longitude for Origin & Destination airports using 3 letter city codes (df_s3).

df_s3 = pd.DataFrame({
    'iata_code':['AMD','BOM','DEL'],
    'Lat':['72.6346969603999','72.8678970337','77.103104'],
    'Lon':['23.0771999359','19.0886993408','28.5665']})

I've tried mapping them one at a time, i.e.

df4=pd.merge(left=df3,right=df_s3,how='left',left_on=['Origin'],right_on=['iata_code'],suffixes=['_origin','_origin'])
df5=pd.merge(left=df4,right=df_s3,how='left',left_on=['Destination'],right_on=['iata_code'],suffixes=['_destination','_destination'])

This updates the values in the dataframe but the columns corresponding to origin lat/long have '_destination' as the suffix

I've even taken an aspirational long shot by combining the two, i.e.

df4=pd.merge(left=df3,right=df_s3,how='left',left_on=['Origin','Destination'],right_on=['iata_code','iata_code'],suffixes=['_origin','_destination'])

Both of these dont seem to be working. Any suggestions on how to make it work in a larger dataset while keeping the processing time low.

Upvotes: 1

Views: 181

Answers (2)

amos972
amos972

Reputation: 53

You can try to apply to each column a function like this one:

def from_place_to_coord(place: str):
    if place in df_s3['iata_code'].to_list():
        Lat = df_s3[df_s3['iata_code'] == place]['Lat'].values[0]
        Lon = df_s3[df_s3['iata_code'] == place]['Lon'].values[0]
        return Lat, Lon
    else:
        print('Not found')

and then:

df3['origin_loc'] = df3['Origin'].apply(from_place_to_coord)
df3['destination_loc'] = df3['Destination'].apply(from_place_to_coord)

It will return you 2 more columns with a tuple of Lat,Lon according to the location

Upvotes: 0

mjspier
mjspier

Reputation: 6526

Your solution was almost correct. But you need to specify the origin suffix in the second merge:

df4=pd.merge(left=df3,
             right=df_s3,how='left',
             left_on=['Origin'],
             right_on=['iata_code'])
df5=pd.merge(left=df4,
             right=df_s3,how='left',
             left_on=['Destination'],
             right_on=['iata_code'],
             suffixes=['_origin', '_destination'])

In the first merge you don't need to specify any suffix as there is no overlap. In the second merge you need to specify the suffix for the right side and the left side. The right side is the longitude and latitude from the origin and the left side are from the destination.

Upvotes: 1

Related Questions