Flavien Lambert
Flavien Lambert

Reputation: 810

Recommended method for filling missing values in pandas

using pandas, I want to fill the missing values of column b from the following DataFrame df1 with the values from the column a

import pandas as pd
import numpy as np

df1 = pd.DataFrame(data=dict(a=[1, 2, 3], b=[1, np.nan, 3]))
mask = pd.isnull(df1.b)

it seems that I can use three different ways:

# first
df1.loc[mask, 'b'] = df1.loc[mask, 'a']
#second
df1.loc[mask, 'b'] = df1.a
# third
df1.fillna(value=dict(b=df1.a), inplace=True)

All of them lead to the same result. Is there a recommended method?

Thanks.

Upvotes: 0

Views: 248

Answers (2)

2Obe
2Obe

Reputation: 3710

Another alternative is to use pandas where() with:

df1["b"].where(df1["b"]==np.nan, other=df1["a"],inplace=True)

print(df1)

   a    b
0  1  1.0
1  2  2.0
2  3  3.0

Upvotes: 0

BENY
BENY

Reputation: 323226

Then we do timing :

import pandas as pd
import numpy as np
df1 = pd.DataFrame(data=dict(a=[1, 2, 3], b=[1, np.nan, 3]))
mask = pd.isnull(df1.b)
%timeit df1.loc[mask, 'b'] = df1.loc[mask, 'a']
1000 loops, best of 3: 1.15 ms per loop
%timeit df1.loc[mask, 'b'] = df1.a
1000 loops, best of 3: 1.16 ms per loop
%timeit df1.fillna(value=dict(b=df1.a), inplace=True)
1000 loops, best of 3: 215 µs per loop

#3rd one is the fastest among three. 

EDIT method from @Zero

%timeit df1.b = df1.b.fillna(df1.a)
1000 loops, best of 3: 371 µs per loop
%timeit df1.b.fillna(df1.a, inplace=True)
1000 loops, best of 3: 210 µs per loop

Upvotes: 2

Related Questions