Cleb
Cleb

Reputation: 26039

Dataframe is not updated when columns are passed to function using apply

I have two dataframes like this:

   A   B
a  1  10
b  2  11
c  3  12
d  4  13 

   A   B
a  11 NaN
b NaN NaN
c NaN  20
d  16  30

They have identical column names and indices. My goal is to replace the NAs in df2 by the values of df1. Currently, I do this like this:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': range(1, 5), 'B': range(10, 14)}, index=list('abcd'))
df2 = pd.DataFrame({'A': [11, np.nan, np.nan, 16], 'B': [np.nan, np.nan, 20, 30]}, index=list('abcd'))    

def repl_na(s, d):

    s[s.isnull().values] = d[s.isnull().values][s.name]

    return s    

df2.apply(repl_na, args=(df1, ))

which gives me the desired output:

    A   B
a  11  10
b   2  11
c   3  20
d  16  30

My question is now how this could be accomplished if the indices of the dataframes are different (column names are still the same, and the columns have the same length). So I would have a df2 like this(df1 is unchanged):

    A   B
0  11 NaN
1 NaN NaN
2 NaN  20
3  16  30

Then the above code does not work anymore since the indices of the dataframes are different. Could someone tell me how the line

s[s.isnull().values] = d[s.isnull().values][s.name]

has to be modified in order to get the same result as above?

Upvotes: 2

Views: 80

Answers (1)

Joachim Isaksson
Joachim Isaksson

Reputation: 181077

You could temporarily change the indexes on df1 to be the same as df2and just combine_first with df2;

df2.combine_first(df1.set_index(df2.index))

    A   B
1  11  10
2   2  11
3   3  20
4  16  30

Upvotes: 3

Related Questions