Reputation: 1223
I have a dataframe df1
with 3 columns (A,B,C), NaN represents missing value here
A B C
1 2 NaN
2 1 2.3
2 3 2.5
I have a dataframe df2
with 3 columns (A,B,D)
A B D
1 2 2
2 1 2
2 3 4
The expected output would be
A B C
1 2 2
2 1 2.3
2 3 2.5
I want to have values in column C in df1
intact if not missing, replaced by corresponding value in D with other two columns value equal, i.e, df1.A==df2.A
and df1.B==df2.B
any good solution?
Upvotes: 0
Views: 407
Reputation: 7828
IIUC:
In [100]: df['C'] = np.where((np.isnan(df.C))&(df.A==df1.A)&(df.B==df1.B),df1.D,df.C)
In [101]: df
Out[101]:
A B C
0 1.0 2.0 2.0
1 2.0 1.0 2.3
2 2.3 1.2 2.5
np.where
is faster when compared:
In [102]: %timeit df['C'] = np.where((np.isnan(df.C))&(df.A==df1.A)&(df.B==df1.B),df1.D,df.C)
1000 loops, best of 3: 1.3 ms per loop
In [103]: %timeit df.set_index(['A', 'B'])['C'].fillna(df1.set_index(['A', 'B'])['D']).reset_index()
100 loops, best of 3: 5.92 ms per loop
Upvotes: 1
Reputation:
One way would be to use the columns A and B as the index. If you use fillna then, pandas will align the indices and give you the correct result:
df1.set_index(['A', 'B'])['C'].fillna(df2.set_index(['A', 'B'])['D']).reset_index()
Out:
A B C
0 1 2 2.0
1 2 1 2.3
2 2 3 2.5
Upvotes: 1