Reputation: 59544
Suppose I have two data frame 'df_a' & 'df_b' , both have the same index structure and columns, but some of the inside data elements are different:
>>> df_a
sales cogs
STK_ID QT
000876 1 100 100
2 100 100
3 100 100
4 100 100
5 100 100
6 100 100
7 100 100
>>> df_b
sales cogs
STK_ID QT
000876 5 50 50
6 50 50
7 50 50
8 50 50
9 50 50
10 50 50
And now I want to replace the element of df_a by element of df_b which have the same (index, column) coordinate, and attach df_b's elements whose (index, column) coordinate beyond the scope of df_a . Just like add a patch 'df_b' to 'df_a' :
>>> df_c = patch(df_a,df_b)
sales cogs
STK_ID QT
000876 1 100 100
2 100 100
3 100 100
4 100 100
5 50 50
6 50 50
7 50 50
8 50 50
9 50 50
10 50 50
How to write the 'patch(df_a,df_b)' function ?
Upvotes: 5
Views: 1091
Reputation: 5467
Similar to BrenBarn's answer, but with more flexibility:
# reindex both to union of indices
df_ar = df_a.reindex(df_a.index | df_b.index)
df_br = df_b.reindex(df_a.index | df_b.index)
# replacement criteria can be put in this lambda function
combiner = lambda: x, y: np.where(y < x, y, x)
df_c = df_ar.combine(df.br, combiner)
Upvotes: 1
Reputation:
I was struggling with the same issue, the code in the previous answers didn't work in my dataframes. They have 2 index columns and the reindex operation results in NaN values in strange places (I'll post the dataframe contents if anyone is willing do debug it).
I found an alternate solution. I'm reviving this thread hoping this may be useful to others:
# concatenate df_a and df_b
df_c = concat([dfbd,dfplanilhas])
# clears the indexes (turns the index columns into regular dataframe columns)
df_c.reset_index(inplace='True')
# removes duplicates keeping the last occurence (hence updating df_a with values from df_b)
df_c.drop_duplicates(subset=['df_a','df_b'], take_last='True', inplace='True')
Not a very elegant solution, but seems to work.
I hope df.update gets a join='outer' option soon...
Upvotes: 0
Reputation: 49886
To fill gaps in one dataframe with values (or even full rows) from another, take a look at the df.combine_first() built-in method.
In [34]: df_b.combine_first(df_a)
Out[34]:
sales cogs
STK_ID QT
000876 1 100 100
2 100 100
3 100 100
4 100 100
5 50 50
6 50 50
7 50 50
8 50 50
9 50 50
10 50 50
Upvotes: 2
Reputation: 251488
Try this:
df_c = df_a.reindex(df_a.index | df_b.index)
df_c.ix[df_b.index] = df_b
Upvotes: 2