Alex Navarro
Alex Navarro

Reputation: 9

SimpleImputer still returning NaN values in Pandas Dataframe

I am trying to create a linear regression model but first I am trying to use SimpleImputer to replace the NaN values with the columns mean. After I run the code, there is still NaN values. I have the following code:

# ########## Modeling ###########

# pipe model and SimpleImputer
model = make_pipeline(SimpleImputer(missing_values =np.nan, strategy='mean'),
                  LinearRegression())

# split the data into train/test:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
                                               random_state=42)
# # print shape:
print("Training data is", X_train.shape)
print("Training target is", X_test.shape)
print("test data is", X_test.shape)
print("test target is", y_test.shape)
X_train

enter image description here

Upvotes: 0

Views: 1275

Answers (2)

Mike Miller Jr.
Mike Miller Jr.

Reputation: 29

I realize this is old but I had a similar issue. Changing SimpleImputer(missing_values = None) worked for me!

Upvotes: 2

mujjiga
mujjiga

Reputation: 16876

missing_values for SimpleImputer class should be set to np.nan(not to NaN as in your code)

NaN is a string while np.nan represents null value.

You should use NaN only if your null values are represented as string NaN

Official docs

missing_values: number, string, np.nan (default) or None

The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values should be set to np.nan, since pd.NA will be converted to np.nan.

Upvotes: 0

Related Questions