Reputation: 141
I have a CSV file with Lat and long values of two models (Model 1 and Model 2) as shown below. I am trying to achieve following
Lat/ Long
values of model 1
in every row of model 2
columns Lat/ Long
values. If the Lat/ Long values of model 1 are found in model 2 then, print their respective area name such as 'Papa' in the new column. The process is repeated for the rest of the rows in model 1 and then will model 2.Lat/Long
values of model 1 didn't match with model 2, then print NOT matched
in the output.``Minimal working example:
import pandas as pd
df3= pd.read_csv("compare.csv")
Output=df3.loc[:, df3.columns.isin(list('LatLong'))]
Searched for the existing answers (1), (2) but could not found the solution.
sample_data.csv
Model 1 Model 2
Lat Lon Name X Y
-33.348652 138.751659 Kastan -41.735983 145.532112
-41.735983 145.532112 Ldon -37.222005 145.921452
-37.222005 145.921452 Papa -33.348652 138.751659
-37.222005 145.921452 tine -34.779284 138.522352
-37.222005 145.921452 Farm -31.543177 118.4565685
-27.112811 150.904878 Loy -38.2536 146.574569
Upvotes: 1
Views: 3654
Reputation: 779
Try this
import pandas as pd
df3 = pd.read_csv("compare.csv", header=[1])
df3['Output'] = df3[['Lat', 'Lon']].merge(df3[['Name', 'X','Y']],
how="left",
left_on=["Lat", "Lon"],
right_on=["X", "Y"])["Name"].fillna("NOT MATCHED")
cols = pd.MultiIndex.from_product([["Model 1"], df3.columns[:2]])
cols = cols.append(pd.MultiIndex.from_product([["Model 2"], df3.columns[2:]]))
df3.columns = cols
print(df3)
Output
Model 1 Model 2
Lat Lon Name X Y Output
0 -33.348652 138.751659 Bastyan -41.735983 145.532112 Papa
1 -41.735983 145.532112 Eildon -37.222005 145.921452 Bastyan
2 -37.222005 145.921452 Papa -33.348652 138.751659 Eildon
3 -37.222005 145.921452 Quar -34.779284 138.522352 Eildon
4 -37.222005 145.921452 Coll -31.543177 118.456569 Eildon
5 -27.112811 150.904878 Loy -38.253600 146.574569 NOT MATCHED
Upvotes: 1
Reputation: 26676
Another way would be
#drop the multilevel columns and cross merge
s=df.droplevel(level=0, axis=1).merge(df.droplevel(level=0, axis=1), how='cross', suffixes=('','_y'))
#filter the matches row_wise and rename columns
#s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$', axis=1).rename(columns=lambda x: x.split('_')[0])
s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$|Name', axis=1).rename(columns={'Name_y':'Output'}).rename(columns=lambda x: x.split('_')[0])
Outcome
Name Lat Lon Output X Y
2 Bastyan -37.222005 145.921452 Papa -33.348652 138.751659
6 Eildon -33.348652 138.751659 Bastyan -41.735983 145.532112
13 Papa -41.735983 145.532112 Eildon -37.222005 145.921452
19 Quar -41.735983 145.532112 Eildon -37.222005 145.921452
25 Coll -41.735983 145.532112 Eildon -37.222005 145.921452
Upvotes: 1
Reputation: 29742
One way using pandas.merge
:
df["Output"] = df["Model 1"].merge(df["Model 2"],
how="left",
left_on=["Lat", "Lon"],
right_on=["X", "Y"],
)["Name"].fillna("NOT MATCHED")
Output:
Model 1 Model 2 Output
Lat Lon Name X Y
0 -33.348652 138.751659 Bastyan -41.735983 145.532112 Papa
1 -41.735983 145.532112 Eildon -37.222005 145.921452 Bastyan
2 -37.222005 145.921452 Papa -33.348652 138.751659 Eildon
3 -37.222005 145.921452 Quar -34.779284 138.522352 Eildon
4 -37.222005 145.921452 Coll -31.543177 118.456569 Eildon
5 -27.112811 150.904878 Loy -38.253600 146.574569 NOT MATCHED
Sample data used:
from io import StringIO
data = """Model 1,Model 1,Model 2,Model 2,Model 2
Lat,Lon,Name,X,Y
-33.348652,138.751659,Bastyan,-41.735983,145.532112
-41.735983,145.532112,Eildon,-37.222005,145.921452
-37.222005,145.921452,Papa,-33.348652,138.751659
-37.222005,145.921452,Quar,-34.779284,138.522352
-37.222005,145.921452,Coll,-31.543177,118.4565685
-27.112811,150.904878,Loy,-38.2536,146.574569"""
df = pd.read_csv(StringIO(data), sep=",", header=[0,1])
Upvotes: 1