Costa.Gustavo
Costa.Gustavo

Reputation: 849

Comparison between dataframes: Check if values of a column of one of the dataframes are in a list within a column of the other dataframe

Considering the two dataframes below:

import pandas as pd

df = pd.DataFrame({'ID_ANTENNA':[1,2,3,4,5], 'LAT':['-21.12699', '-21.12699', '-21.48194','-21.12699', '-21.11886'],
                  'LONG': ['-44.26144', '-44.26144', '-44.33039', '-44.21979', '-44.16478']})


city_ids = ['SJDR1', 'LD', 'SVM', 'TR']
sjdr = [1,2]
ld = [3]
svm = [4]
tr = [5]

df2 = pd.DataFrame({'ID_ANTENNA':[sjdr, ld, svm, tr], 'CITY_IDS':city_ids})

df2['ID_ANTENNA'] = df2['ID_ANTENNA'].astype(str)
df2['ID_ANTENNA'] = df2['ID_ANTENNA'].str.replace('[','')
df2['ID_ANTENNA'] = df2['ID_ANTENNA'].str.replace(']','')

I need to compare the column ID_ANTENNA from df with the same column from df2. If the value of ID_ANTENNA appears within df2's ID_ANTENNA, the CITY_IDS column must be added to df. An example of desired output:

ID |   LAT   |   LONG   |  CITY_IDS
1  |-21.12699| -44.26144|   SJDR1
2  |-21.12699| -44.26144|   SJDR1
3  |-21.48194| -44.33039|   LD
4  |-21.12699| -44.21979|   SVM
5  |-21.11886| -44.16478|   TR

However, I have no idea how to iterate over this column. Can anyone help?

P.s.: I transform the ID_ANTENNA column from df2 to string because it is the column format in the real scenario, the result of reading a .csv file.

Upvotes: 0

Views: 55

Answers (2)

Priya G
Priya G

Reputation: 46

for i in range(len(df['ID_ANTENNA'])):
    for j in range(len(df2['ID_ANTENNA'])):
        val=map(int,df2['ID_ANTENNA'][j].split(','))
        if df['ID_ANTENNA'][i] in val:
            df['CITY_IDS'].iloc[i]=df2['CITY_IDS'][j]


df

Run above code to get desired output:

enter image description here

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can split the ID_ANTENNA, explode it and merge. Note that the ID_ANTENNA in df is integer while ID_ANTENNA in df2 is string type:

df.merge(df2.assign(ID_ANTENNA=df2['ID_ANTENNA'].str.split(', '))
            .explode('ID_ANTENNA')
            .assign(ID_ANTENNA=lambda x: x['ID_ANTENNA'].astype(int)),
         on='ID_ANTENNA', how='left')

Output:

   ID_ANTENNA        LAT       LONG CITY_IDS
0           1  -21.12699  -44.26144    SJDR1
1           2  -21.12699  -44.26144    SJDR1
2           3  -21.48194  -44.33039       LD
3           4  -21.12699  -44.21979      SVM
4           5  -21.11886  -44.16478       TR

Upvotes: 1

Related Questions