user27500319
user27500319

Reputation: 1

How can I achieve accurate imputation of missing values in a dataset?

I'm working with a dataset containing details about used cars, and I've encountered several missing values in the Fuel_Type column. The possible values include 'Gasoline', 'E85 Flex Fuel', 'Hybrid', 'Diesel', and others. Currently, my data has over 4,000 electric vehicles, fewer than 50 gasoline vehicles, and some hybrids with missing Fuel_Type entries. Additionally, some entries contain non-standard values like '–' and 'not supported'. Accurately filling these missing values is crucial for my analysis, as they significantly impact the results.

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

# Sample DataFrame
data = {
    'Car': ['Toyota', 'Honda', 'Tesla', None, 'Ford'],
    'Fuel_Type': ['Gasoline', 'E85 Flex Fuel', np.nan, 'Hybrid', None],
    'Transmission': ['Automatic', None, 'Automatic', 'Manual', 'Manual']
}

df = pd.DataFrame(data)

# Initial imputation attempt
imputer = SimpleImputer(strategy='most_frequent')
df['Fuel_Type'] = imputer.fit_transform(df[['Fuel_Type']])
print(df)

Upvotes: -1

Views: 75

Answers (1)

mozway
mozway

Reputation: 262214

You could fillna with empty strings ('') and define those as the missing values, also slice the output to make it 1D:

imputer = SimpleImputer(strategy='most_frequent', missing_values='')
df['Fuel_Type'] = imputer.fit_transform(df[['Fuel_Type']].fillna(''))[:, 0]

Output:

      Car      Fuel_Type Transmission
0  Toyota       Gasoline    Automatic
1   Honda  E85 Flex Fuel         None
2   Tesla  E85 Flex Fuel    Automatic
3    None         Hybrid       Manual
4    Ford  E85 Flex Fuel       Manual

If you want to handle all columns:

imputer = SimpleImputer(strategy='most_frequent', missing_values='')
df[:] = imputer.fit_transform(df.fillna(''))

Output:

      Car      Fuel_Type Transmission
0  Toyota       Gasoline    Automatic
1   Honda  E85 Flex Fuel    Automatic
2   Tesla  E85 Flex Fuel    Automatic
3    Ford         Hybrid       Manual
4    Ford  E85 Flex Fuel       Manual

Upvotes: 0

Related Questions