conni
conni

Reputation: 75

Updating pandas DataFrame according to columns not index

this is probably a stupid beginner's question but I simply cannot figure it out after trying many different ways.

I have two pandas DataFrames. They look like the following:

df0 = pd.DataFrame(columns = ["x","y","z","val0","val1"])
df0 = df0.append({"x":0,"y":0,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":0,"y":0,"z":1,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":0,"y":1,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":0,"y":1,"z":1,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":0,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":0,"z":1,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":1,"z":0,"val0":0,"val1":0}, ignore_index=True)
df0 = df0.append({"x":1,"y":1,"z":1,"val0":0,"val1":0}, ignore_index=True)

and

df1 = pd.DataFrame(columns = ["x","y","z","val0","val1"])
df1 = df1.append({"x":0,"y":1,"z":0,"val0":2,"val1":2}, ignore_index=True)
df1 = df1.append({"x":1,"y":0,"z":1,"val0":4,"val1":4}, ignore_index=True)

As you can imagine, x, y, and z serve as "coordinates" for the two value columns val0 and val1. So I want to replace the values in df0 by the ones given in df1 according to the coordinates and not the DF-internal index. Because, if I just run df0.update(df1), the first two lines (with coordinates 000 and 001) are replaced and, of course, also the values for x, y, and z are replaced such that I end up with two lines for 010 and 101. If I use df0.merge(df1) I end up with sort-of the intersection of the two, so only two lines and both sets of values survive.

How do I do this properly then? And even more: how do I do it properly for a variable set of columns/coordinates? In my use case, the data frames are built from a list of columns (which can change per scenario, of course df0 and df1 are built in the same way though) and a boolean flag per element that tells me whether a column of a given name is a coordinate or value.

Any advice would be welcomed.

Thank you!

c

Upvotes: 2

Views: 517

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

If you want update val0 and val1 according x, y and z, you can use .merge + .update:

df0.update(df0[["x", "y", "z"]].merge(df1, on=["x", "y", "z"], how="left"))
print(df0)

Prints:

   x  y  z val0 val1
0  0  0  0    0    0
1  0  0  1    0    0
2  0  1  0    2    2
3  0  1  1    0    0
4  1  0  0    0    0
5  1  0  1    4    4
6  1  1  0    0    0
7  1  1  1    0    0

Upvotes: 4

Related Questions