user12729159
user12729159

Reputation: 141

Check for matching values in columns- python/pandas

I have a CSV file with Lat and long values of two models (Model 1 and Model 2) as shown below. I am trying to achieve following

Minimal working example:

import pandas as pd

df3= pd.read_csv("compare.csv")
Output=df3.loc[:, df3.columns.isin(list('LatLong'))]

Searched for the existing answers (1), (2) but could not found the solution.

sample_data.csv

Model 1                         Model 2     
Lat            Lon        Name        X        Y
-33.348652  138.751659  Kastan  -41.735983  145.532112
-41.735983  145.532112  Ldon    -37.222005  145.921452
-37.222005  145.921452  Papa    -33.348652  138.751659
-37.222005  145.921452  tine    -34.779284  138.522352
-37.222005  145.921452  Farm    -31.543177  118.4565685
-27.112811  150.904878  Loy     -38.2536    146.574569

enter image description here

Upvotes: 1

Views: 3654

Answers (3)

EEEEH
EEEEH

Reputation: 779

Try this

import pandas as pd

df3 = pd.read_csv("compare.csv", header=[1])

df3['Output'] = df3[['Lat', 'Lon']].merge(df3[['Name', 'X','Y']], 
                                          how="left",
                                          left_on=["Lat", "Lon"], 
                                          right_on=["X", "Y"])["Name"].fillna("NOT MATCHED")

cols = pd.MultiIndex.from_product([["Model 1"], df3.columns[:2]])
cols = cols.append(pd.MultiIndex.from_product([["Model 2"], df3.columns[2:]]))

df3.columns = cols

print(df3)

Output

    Model 1                 Model 2
    Lat         Lon         Name    X           Y           Output
0   -33.348652  138.751659  Bastyan -41.735983  145.532112  Papa
1   -41.735983  145.532112  Eildon  -37.222005  145.921452  Bastyan
2   -37.222005  145.921452  Papa    -33.348652  138.751659  Eildon
3   -37.222005  145.921452  Quar    -34.779284  138.522352  Eildon
4   -37.222005  145.921452  Coll    -31.543177  118.456569  Eildon
5   -27.112811  150.904878  Loy     -38.253600  146.574569  NOT MATCHED

Upvotes: 1

wwnde
wwnde

Reputation: 26676

Another way would be

#drop the multilevel columns and cross merge

 s=df.droplevel(level=0, axis=1).merge(df.droplevel(level=0, axis=1), how='cross', suffixes=('','_y'))

#filter the matches row_wise and rename columns

#s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$', axis=1).rename(columns=lambda x: x.split('_')[0])
s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$|Name', axis=1).rename(columns={'Name_y':'Output'}).rename(columns=lambda x: x.split('_')[0])

Outcome

      Name        Lat         Lon   Output          X           Y
2   Bastyan -37.222005  145.921452     Papa -33.348652  138.751659
6    Eildon -33.348652  138.751659  Bastyan -41.735983  145.532112
13     Papa -41.735983  145.532112   Eildon -37.222005  145.921452
19     Quar -41.735983  145.532112   Eildon -37.222005  145.921452
25     Coll -41.735983  145.532112   Eildon -37.222005  145.921452

Upvotes: 1

Chris
Chris

Reputation: 29742

One way using pandas.merge:

df["Output"] = df["Model 1"].merge(df["Model 2"], 
                                   how="left",
                                   left_on=["Lat", "Lon"], 
                                   right_on=["X", "Y"],
                                  )["Name"].fillna("NOT MATCHED")

Output:

     Model 1              Model 2                              Output
         Lat         Lon     Name          X           Y             
0 -33.348652  138.751659  Bastyan -41.735983  145.532112         Papa
1 -41.735983  145.532112   Eildon -37.222005  145.921452      Bastyan
2 -37.222005  145.921452     Papa -33.348652  138.751659       Eildon
3 -37.222005  145.921452     Quar -34.779284  138.522352       Eildon
4 -37.222005  145.921452     Coll -31.543177  118.456569       Eildon
5 -27.112811  150.904878      Loy -38.253600  146.574569  NOT MATCHED

Sample data used:

from io import StringIO

data = """Model 1,Model 1,Model 2,Model 2,Model 2
Lat,Lon,Name,X,Y
-33.348652,138.751659,Bastyan,-41.735983,145.532112
-41.735983,145.532112,Eildon,-37.222005,145.921452
-37.222005,145.921452,Papa,-33.348652,138.751659
-37.222005,145.921452,Quar,-34.779284,138.522352
-37.222005,145.921452,Coll,-31.543177,118.4565685
-27.112811,150.904878,Loy,-38.2536,146.574569"""

df = pd.read_csv(StringIO(data), sep=",", header=[0,1])

Upvotes: 1

Related Questions