Merging dataframes and ignoring duplicates indices (selectively) in Pandas

Question

I'd like to concatenate two dataframes df1, df2 to a new one where values in duplicate indices/columns are assigned as NA if they exist in both df1 and df2, but real values considered should they exist in only one:

df1:

    A    B    C
X   NA   1    2    #Note: (X,B) also in df2
Y   3    2    NA   #Note: (Y,B) also in df2
Z   NA   NA   4

df2:

    A    B    F
X   -1   -1   NA    #Note: (X,A) not in df1
Y   NA   -3   -4    #Note: (Y,A) not in df1
W   NA   -3   -4

Output:

    A    B    C    F
X   -1   NA   2    NA
Y   3    NA   NA   -4
Z   NA   NA   4    NA
W   NA   -3   NA   -4

I think that something like pd.concat([df1,df2], axis=1) should work, but I dont know how to make the special condition for the indices/columns that have values. Thank you in advance

jezrael · Accepted Answer

You can use combine_first and then replace NaN by mask - mask is created by notnull and &, NaN are replaced to False by fillna:

mask = (df1.notnull() & df2.notnull()).fillna(False)
print (mask)
       A      B      C      F
W  False  False  False  False
X  False   True  False  False
Y  False   True  False  False
Z  False  False  False  False

#if necessary change order of index values add reindex
df = df1.combine_first(df2).reindex(['X','Y','Z', 'W'])
print (df)
     A    B    C    F
X -1.0  1.0  2.0  NaN
Y  3.0  2.0  NaN -4.0
Z  NaN  NaN  4.0  NaN
W  NaN -3.0  NaN -4.0

print (df.mask(mask))
     A    B    C    F
X -1.0  NaN  2.0  NaN
Y  3.0  NaN  NaN -4.0
Z  NaN  NaN  4.0  NaN
W  NaN -3.0  NaN -4.0

Merging dataframes and ignoring duplicates indices (selectively) in Pandas

Answers (1)

Related Questions