Reputation: 1949
I'd like to concatenate two dataframes df1, df2 to a new one where values in duplicate indices/columns are assigned as NA
if they exist in both df1 and df2, but real values considered should they exist in only one:
df1:
A B C
X NA 1 2 #Note: (X,B) also in df2
Y 3 2 NA #Note: (Y,B) also in df2
Z NA NA 4
df2:
A B F
X -1 -1 NA #Note: (X,A) not in df1
Y NA -3 -4 #Note: (Y,A) not in df1
W NA -3 -4
Output:
A B C F
X -1 NA 2 NA
Y 3 NA NA -4
Z NA NA 4 NA
W NA -3 NA -4
I think that something like pd.concat([df1,df2], axis=1)
should work, but I dont know how to make the special condition for the indices/columns that have values.
Thank you in advance
Upvotes: 1
Views: 778
Reputation: 862431
You can use combine_first
and then replace NaN
by mask
- mask
is created by notnull
and &
, NaN
are replaced to False
by fillna
:
mask = (df1.notnull() & df2.notnull()).fillna(False)
print (mask)
A B C F
W False False False False
X False True False False
Y False True False False
Z False False False False
#if necessary change order of index values add reindex
df = df1.combine_first(df2).reindex(['X','Y','Z', 'W'])
print (df)
A B C F
X -1.0 1.0 2.0 NaN
Y 3.0 2.0 NaN -4.0
Z NaN NaN 4.0 NaN
W NaN -3.0 NaN -4.0
print (df.mask(mask))
A B C F
X -1.0 NaN 2.0 NaN
Y 3.0 NaN NaN -4.0
Z NaN NaN 4.0 NaN
W NaN -3.0 NaN -4.0
Upvotes: 1