Reputation: 810
using pandas, I want to fill the missing values of column b
from the following DataFrame df1
with the values from the column a
import pandas as pd
import numpy as np
df1 = pd.DataFrame(data=dict(a=[1, 2, 3], b=[1, np.nan, 3]))
mask = pd.isnull(df1.b)
it seems that I can use three different ways:
# first
df1.loc[mask, 'b'] = df1.loc[mask, 'a']
#second
df1.loc[mask, 'b'] = df1.a
# third
df1.fillna(value=dict(b=df1.a), inplace=True)
All of them lead to the same result. Is there a recommended method?
Thanks.
Upvotes: 0
Views: 248
Reputation: 3710
Another alternative is to use pandas where() with:
df1["b"].where(df1["b"]==np.nan, other=df1["a"],inplace=True)
print(df1)
a b
0 1 1.0
1 2 2.0
2 3 3.0
Upvotes: 0
Reputation: 323226
Then we do timing :
import pandas as pd
import numpy as np
df1 = pd.DataFrame(data=dict(a=[1, 2, 3], b=[1, np.nan, 3]))
mask = pd.isnull(df1.b)
%timeit df1.loc[mask, 'b'] = df1.loc[mask, 'a']
1000 loops, best of 3: 1.15 ms per loop
%timeit df1.loc[mask, 'b'] = df1.a
1000 loops, best of 3: 1.16 ms per loop
%timeit df1.fillna(value=dict(b=df1.a), inplace=True)
1000 loops, best of 3: 215 µs per loop
#3rd one is the fastest among three.
EDIT method from @Zero
%timeit df1.b = df1.b.fillna(df1.a)
1000 loops, best of 3: 371 µs per loop
%timeit df1.b.fillna(df1.a, inplace=True)
1000 loops, best of 3: 210 µs per loop
Upvotes: 2