Reputation: 43
I have two dataframes df1
and df2
.
d = d = {'ID': [31,42,63,44,45,26],
'lat': [64,64,64,64,64,64],
'lon': [152,152,152,152,152,152],
'other1': [12,13,14,15,16,17],
'other2': [21,22,23,24,25,26]}
df1 = pd.DataFrame(data=d)
d2 ={'ID': [27,48,31,45,49,10],
'LAT': [63,63,63,63,63,63],
'LON': [153,153,153,153,153,153]}
df2 = pd.DataFrame(data=d2)
df1
has incorrect values for columns lat
and lon
, but has correct data in the other columns that I need to keep track of. df2
has correct LAT
and LON
values but only has a few common IDs with df1
. There are two things I would like to accomplish. First, I want to split df1
into two dataframes: df3
which has IDs that are present in df2
; and df4
which has everything else. I can get df3
with:
df3=pd.DataFrame()
for i in reduce(np.intersect1d, [df1.ID, df2.ID]):
df3=df3.append(df1.loc[df1.ID==i])
but how do I get df4
to be the remaining data?
Second, I want to replace the lat
and lon
values in df3
with the correct data fromdf2
.
I figure there is a slick python way to do something like:
for j in range(len(df3)):
for k in range(len(df2)):
if df3.ID[j] == df2.ID[k]:
df3.lat[j] = df2.LAT[k]
df3.lon[j] = df2.LON[k]
But I can't even get the above nested loop working correctly. I don't want to spend a lot of time getting it working if there is a better way to accomplish this in python.
Upvotes: 1
Views: 71
Reputation: 195593
For question 1, you can use boolean indexing:
m = df1.ID.isin(df2.ID)
df3 = df1[m]
df4 = df1[~m]
print(df3)
print(df4)
Prints:
ID lat lon other1 other2
0 31 64 152 12 21
4 45 64 152 16 25
ID lat lon other1 other2
1 42 64 152 13 22
2 63 64 152 14 23
3 44 64 152 15 24
5 26 64 152 17 26
For question 2:
x = df3.merge(df2, on="ID")[["ID", "other1", "other2", "LAT", "LON"]]
print(x)
Prints:
ID other1 other2 LAT LON
0 31 12 21 63 153
1 45 16 25 63 153
EDIT: For question 2 you can do:
x = df3.merge(df2, on="ID").drop(columns=["lat", "lon"])
print(x)
Upvotes: 2
Reputation: 75140
You can merge with indicator True and then keep preference for LAT
and LON
and fill the rest by lat
and lon
, then use the indicator and a grouper and create a dictionary. Then grab the keys of the dictionary:
u = df1.merge(df2,on='ID',how='left',indicator='I')
u[['LAT','LON']] = np.where(u[['LAT','LON']].isna(),u[['lat','lon']],u[['LAT','LON']])
u = u.drop(['lat','lon'],1)
u['I'] = np.where(u['I'].eq("left_only"),"left_df","others")
d = dict(iter(u.groupby("I")))
print(d['left_df'],'\n--------\n',d['others'])
ID other1 other2 LAT LON I
1 42 13 22 64.0 152.0 left_df
2 63 14 23 64.0 152.0 left_df
3 44 15 24 64.0 152.0 left_df
5 26 17 26 64.0 152.0 left_df
--------
ID other1 other2 LAT LON I
0 31 12 21 63.0 153.0 others
4 45 16 25 63.0 153.0 others
Upvotes: 1