Jin
Jin

Reputation: 1223

update missing values in Python Pandas dataframe with matching conditions

I have a dataframe df1 with 3 columns (A,B,C), NaN represents missing value here

A     B      C  
1     2    NaN
2     1    2.3
2     3    2.5

I have a dataframe df2 with 3 columns (A,B,D)

A     B     D
1     2     2
2     1     2
2     3     4

The expected output would be

A     B      C
1     2      2
2     1      2.3
2     3      2.5

I want to have values in column C in df1 intact if not missing, replaced by corresponding value in D with other two columns value equal, i.e, df1.A==df2.A and df1.B==df2.B

any good solution?

Upvotes: 0

Views: 407

Answers (2)

shivsn
shivsn

Reputation: 7828

IIUC:

In [100]: df['C'] = np.where((np.isnan(df.C))&(df.A==df1.A)&(df.B==df1.B),df1.D,df.C)

In [101]: df
Out[101]: 
     A    B    C
0  1.0  2.0  2.0
1  2.0  1.0  2.3
2  2.3  1.2  2.5

np.where is faster when compared:

In [102]: %timeit df['C'] = np.where((np.isnan(df.C))&(df.A==df1.A)&(df.B==df1.B),df1.D,df.C)
1000 loops, best of 3: 1.3 ms per loop


In [103]: %timeit df.set_index(['A', 'B'])['C'].fillna(df1.set_index(['A', 'B'])['D']).reset_index()
100 loops, best of 3: 5.92 ms per loop

Upvotes: 1

user2285236
user2285236

Reputation:

One way would be to use the columns A and B as the index. If you use fillna then, pandas will align the indices and give you the correct result:

df1.set_index(['A', 'B'])['C'].fillna(df2.set_index(['A', 'B'])['D']).reset_index()
Out: 
   A  B    C
0  1  2  2.0
1  2  1  2.3
2  2  3  2.5

Upvotes: 1

Related Questions