Reputation: 7625
I have got two data frames.
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [400, np.nan, 600]})
>>> print(df)
A B
0 1 400.0
1 2 NaN
2 3 600.0
and
>>> new_df = pd.DataFrame({'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> print(new_df)
B C
0 4 7
1 5 8
2 6 9
How can I update df
by new_df
to fill NaN values? I would like to get following:
>>> print(df)
A B
0 1 400.0
1 2 5.0
2 3 600.0
Upvotes: 1
Views: 2214
Reputation: 2811
One way of doing this is using .update
df.update(new_df, overwrite = False)
df.head()
#output:
A B
0 1 400.0
1 2 5.0
2 3 600.0
%%timeit
df = pd.DataFrame({'A': [1, 2, 3] * 1000, 'B': [400, np.nan, 600] * 1000})
new_df = pd.DataFrame({'B': [4, 5, 6] * 1000, 'C': [7, 8, 9] * 1000})
df.update(new_df, overwrite = False)
4.24 ms ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df = pd.DataFrame({'A': [1, 2, 3] * 1000, 'B': [400, np.nan, 600] * 1000})
new_df = pd.DataFrame({'B': [4, 5, 6] * 1000, 'C': [7, 8, 9] * 1000})
df.fillna(new_df)
6.78 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df = pd.DataFrame({'A': [1, 2, 3] * 1000, 'B': [400, np.nan, 600] * 1000})
new_df = pd.DataFrame({'B': [4, 5, 6] * 1000, 'C': [7, 8, 9] * 1000})
df['B'] = np.where(df['B'].isnull(), new_df['B'], df['B'])
3.91 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 2
Reputation: 16147
import numpy as np
df['B'] = np.where(df['B'].isnull(), new_df['B'], df['B'])
Upvotes: 2