nachiappanpl
nachiappanpl

Reputation: 783

Consolidating two data frames in Python

How to overlay/consolidate two data frames in python, such that overlapping cells (index,column) get added and uncommon cells preserve the values from original data frame?

This is possible in Excel as explained here.

For Eg: I have two data frames

df1 = df1 df2 = df2

I want the output to be like

output_df = output_df

Upvotes: 1

Views: 200

Answers (1)

jezrael
jezrael

Reputation: 862581

You need add with fillna, but in df1 and df2 have to be NO NaN values, because fillna removed them:

print (df1.add(df2, fill_value=0).fillna(0).astype(int))
    a    b    c
a1  1    2    0
a2  3  514  123
a3  5  539  134

Another solution with union of both indexes and reindex - works if NaN in DataFrames:

idx = df1.index.union(df2.index)

print (df1.reindex(idx, fill_value=0))
    a  b
a1  1  2
a2  3  4
a3  5  6

print (df2.reindex(idx, fill_value=0))
      b    c
a1    0    0
a2  510  123
a3  533  134

idx = df1.index.union(df2.index)
print (df1.reindex(idx, fill_value=0)
       .add(df2.reindex(idx, fill_value=0), fill_value=0)
       .astype(int))
    a    b    c
a1  1    2    0
a2  3  514  123
a3  5  539  134

df1 = pd.DataFrame({'a':[1,3,5],
                   'b':[2,4,6]}, index=['a1','a2','a3'])

print (df1)
    a  b
a1  1  2
a2  3  4
a3  5  6

df2 = pd.DataFrame({'b':[510,533],
                   'c':[123,np.nan]}, index=['a2','a3'])

print (df2)
      b      c
a2  510  123.0
a3  533    NaN

print (df1.reindex(idx, fill_value=0).add(df2.reindex(idx, fill_value=0), fill_value=0))
      a    b      c
a1  1.0    2    0.0
a2  3.0  514  123.0
a3  5.0  539    NaN
print (df1.add(df2, fill_value=0).fillna(0).astype(int))
    a    b    c
a1  1    2    0
a2  3  514  123
a3  5  539    0 <- replace NaN to 0

Upvotes: 1

Related Questions