Consolidating two data frames in Python

Question

How to overlay/consolidate two data frames in python, such that overlapping cells (index,column) get added and uncommon cells preserve the values from original data frame?

This is possible in Excel as explained here.

For Eg: I have two data frames

df1 = df2 =

I want the output to be like

output_df =

jezrael · Accepted Answer

You need add with fillna, but in df1 and df2 have to be NO NaN values, because fillna removed them:

print (df1.add(df2, fill_value=0).fillna(0).astype(int))
    a    b    c
a1  1    2    0
a2  3  514  123
a3  5  539  134

Another solution with union of both indexes and reindex - works if NaN in DataFrames:

idx = df1.index.union(df2.index)

print (df1.reindex(idx, fill_value=0))
    a  b
a1  1  2
a2  3  4
a3  5  6

print (df2.reindex(idx, fill_value=0))
      b    c
a1    0    0
a2  510  123
a3  533  134

idx = df1.index.union(df2.index)
print (df1.reindex(idx, fill_value=0)
       .add(df2.reindex(idx, fill_value=0), fill_value=0)
       .astype(int))
    a    b    c
a1  1    2    0
a2  3  514  123
a3  5  539  134

df1 = pd.DataFrame({'a':[1,3,5],
                   'b':[2,4,6]}, index=['a1','a2','a3'])

print (df1)
    a  b
a1  1  2
a2  3  4
a3  5  6

df2 = pd.DataFrame({'b':[510,533],
                   'c':[123,np.nan]}, index=['a2','a3'])

print (df2)
      b      c
a2  510  123.0
a3  533    NaN

print (df1.reindex(idx, fill_value=0).add(df2.reindex(idx, fill_value=0), fill_value=0))
      a    b      c
a1  1.0    2    0.0
a2  3.0  514  123.0
a3  5.0  539    NaN

print (df1.add(df2, fill_value=0).fillna(0).astype(int))
    a    b    c
a1  1    2    0
a2  3  514  123
a3  5  539    0 <- replace NaN to 0

Consolidating two data frames in Python

Answers (1)

Related Questions