Reputation: 9
I am trying to create a linear regression model but first I am trying to use SimpleImputer to replace the NaN values with the columns mean. After I run the code, there is still NaN values. I have the following code:
# ########## Modeling ###########
# pipe model and SimpleImputer
model = make_pipeline(SimpleImputer(missing_values =np.nan, strategy='mean'),
LinearRegression())
# split the data into train/test:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
random_state=42)
# # print shape:
print("Training data is", X_train.shape)
print("Training target is", X_test.shape)
print("test data is", X_test.shape)
print("test target is", y_test.shape)
X_train
Upvotes: 0
Views: 1275
Reputation: 29
I realize this is old but I had a similar issue. Changing SimpleImputer(missing_values = None)
worked for me!
Upvotes: 2
Reputation: 16876
missing_values
for SimpleImputer
class should be set to np.nan
(not to NaN
as in your code)
NaN
is a string while np.nan
represents null value.
You should use NaN
only if your null values are represented as string NaN
missing_values: number, string, np.nan (default) or None
The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values should be set to np.nan, since pd.NA will be converted to np.nan.
Upvotes: 0