default-303
default-303

Reputation: 395

pandas fillna on specific part of dataframe does not work as intended

I'm trying to do fill in missing values on a car dataset.

my dataset has the following columns: name, seats, mileage, price along with 10 other columns.

For instance, the seats column has some missing values, to fill in the nan values I plan on looking at the corresponding name column first to get the name of the car, find how many seats that car usually has and replace all the nan values with it.

Here is my code:

seat_cars = df[df['seats'].isnull()]['name'].unique()

for car in seat_cars:
    mode = df.loc[df['name'] == car, 'seats'].mode()          #returns a series
    if mode.empty == False:
        df.loc[df['name'] == car, 'seats'].fillna(mode[0], inplace = True)

But this approach doesn't seem to work as the non-null values count did not change when I do df.info(). In some columns, this method seems to increase the nan count in a column.

What am i getting wrong over here ? Any help is appreciated.

Edit: I changed my code to this-

def fillwithmode(s):
    mode = s.mode()
    if mode.empty == False:
        s.fillna(mode[0])
    return s
    
df['seats'] = df.groupby('name')['seats'].apply(lambda x : fillwithmode(x))

but that still does not seem to fill in missing values

Upvotes: 0

Views: 636

Answers (1)

sophocles
sophocles

Reputation: 13821

IIUC you want to fill the null values per car name with each name's mode value, if that you can use groupby and fillna:

# Initial DF

print(df)
   name  seats  mileage  price
0     a    NaN       72  37095
1     a    3.0       78  20039
2     a    3.0       21  37002
3     a    NaN       79  43251
4     b    3.0       41  31115
5     b    3.0       77  30717
6     b    5.0       73  28443
7     b    NaN       20  40532
8     c    4.0       85  21792
9     c    4.0       51  26383
10    c    4.0       56  29391
11    c    NaN       77  42427
12    d    2.0       53  25393
13    d    NaN       67  22605

# Fill nulls

df.assign(
    seats = df.groupby(
        ['name']
    ).seats.apply(
        lambda x: x.fillna(x.mode()[0])
    )
)

Out[18]: 
   name  seats  mileage  price
0     a    3.0       72  37095
1     a    3.0       78  20039
2     a    3.0       21  37002
3     a    3.0       79  43251
4     b    3.0       41  31115
5     b    3.0       77  30717
6     b    5.0       73  28443
7     b    3.0       20  40532
8     c    4.0       85  21792
9     c    4.0       51  26383
10    c    4.0       56  29391
11    c    4.0       77  42427
12    d    2.0       53  25393
13    d    2.0       67  22605

Don't forget to assign back when you use assign as it returns a copy.

Upvotes: 1

Related Questions