Reputation: 75
this is probably a stupid beginner's question but I simply cannot figure it out after trying many different ways.
I have two pandas DataFrames. They look like the following:
df0 = pd.DataFrame(columns = ["x","y","z","val0","val1"])
df0 = df0.append({"x":0,"y":0,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":0,"y":0,"z":1,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":0,"y":1,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":0,"y":1,"z":1,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":0,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":0,"z":1,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":1,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":1,"z":1,"val0":0,"val1":0}, ignore_index=True)
and
df1 = pd.DataFrame(columns = ["x","y","z","val0","val1"])
df1 = df1.append({"x":0,"y":1,"z":0,"val0":2,"val1":2}, ignore_index=True)
df1 = df1.append({"x":1,"y":0,"z":1,"val0":4,"val1":4}, ignore_index=True)
As you can imagine, x
, y
, and z
serve as "coordinates" for the two value columns val0
and val1
. So I want to replace the values in df0
by the ones given in df1
according to the coordinates and not the DF-internal index. Because, if I just run df0.update(df1)
, the first two lines (with coordinates 000
and 001
) are replaced and, of course, also the values for x
, y
, and z
are replaced such that I end up with two lines for 010
and 101
. If I use df0.merge(df1)
I end up with sort-of the intersection of the two, so only two lines and both sets of values survive.
How do I do this properly then?
And even more: how do I do it properly for a variable set of columns/coordinates? In my use case, the data frames are built from a list of columns (which can change per scenario, of course df0
and df1
are built in the same way though) and a boolean flag per element that tells me whether a column of a given name is a coordinate or value.
Any advice would be welcomed.
Thank you!
c
Upvotes: 2
Views: 517
Reputation: 195418
If you want update val0
and val1
according x
, y
and z
, you can use .merge
+ .update
:
df0.update(df0[["x", "y", "z"]].merge(df1, on=["x", "y", "z"], how="left"))
print(df0)
Prints:
x y z val0 val1
0 0 0 0 0 0
1 0 0 1 0 0
2 0 1 0 2 2
3 0 1 1 0 0
4 1 0 0 0 0
5 1 0 1 4 4
6 1 1 0 0 0
7 1 1 1 0 0
Upvotes: 4