KSHMR
KSHMR

Reputation: 809

Pandas fill null values not knowing the index of the Series with array of same shape of null values

I'm trying to fill some NaN values in one column of my dataframe. From what I understand from the docs, I should be able to pass a Pandas series to fillna, and Pandas will then fill my NaN's with the Series I provided.

The code is like this:

XTrain_pd[class_name] = XTrain_pd[class_name].fillna(pd.Series(train_pred))

So fill in the NaN values based on the values from train_pred.

I made sure that the length of train_pred and the number of NaN's to be filled is the same:

print(XTrain_pd[class_name].isna().sum(),print(train_pred.shape))

This outputs:

(9,)
9 None

I also printed out XTrain_pd before and after using fillna on the NaN values.

Left image is before fillna, right image is after fillna.

Some mysterious things happen here. Firstly, only one NaN value is imputed, in row #6. Secondly, my pd.NA values get converted to np.nan values. What is going on here?

BeforeAfter

Upvotes: 0

Views: 778

Answers (1)

Miguel Trejo
Miguel Trejo

Reputation: 6667

TL;DR Use .loc() to filter the nan values and replace with the predictions df.loc[df.class_name.isna(), 'class_name'] = train_pred

Consider a dataframe with two null values at index 3 and 9

d = {
    'col_str': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'],
    'col_float': [1, 2, 3, np.nan, 5, 6, 7, 8, 9, np.nan]
}
df = pd.DataFrame(d)
df
>>>
 col_str  col_float
0   a   1.0
1   b   2.0
2   c   3.0
3   d   NaN
4   e   5.0
5   f   6.0
6   g   7.0
7   h   8.0
8   i   9.0
9   j   NaN

if you want to replace null values with the predictions train_pred, just filter the Nan values on col_float and replace it with the predictions.

train_pred = [4.0, 10.0]

df.loc[df.col_float.isna(), 'col_float'] = train_pred

If you were to use fillna() you wourld need to specify each value for each index of the Series.

Upvotes: 1

Related Questions