Reputation: 508
I have the following very simple code trying to model a simple dataset:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
data = {'Feature_A': [1, 2, 3, 4], 'Feature_B': [7, 8, 9, 10], 'Feature_C': [2, 3, 4, 5], 'Label': [7, 7, 8, 9]}
data = pd.DataFrame(data)
data_labels = data['Label']
data = data.drop(columns=['Label'])
pipeline = Pipeline([('imputer', SimpleImputer()),
('std_scaler', StandardScaler())])
data_prepared = pipeline.fit_transform(data)
lin_reg = LinearRegression()
lin_grid = {"n_jobs": [20, 50]}
error = "max_error"
grid_search = GridSearchCV(lin_reg, param_grid=lin_grid, verbose=3, cv=2, refit=True, scoring=error, return_train_score=True)
grid_search.fit(data_prepared, data_labels)
print(grid_search.best_estimator_.coef_)
print(grid_search.best_estimator_.intercept_)
print(list(data_labels))
print(list(grid_search.best_estimator_.predict(data_prepared)))
That gives me the following results:
[0.2608746 0.2608746 0.2608746]
7.75
[7, 7, 8, 9]
[6.7, 7.4, 8.1, 8.799999999999999]
From there, is there a way of computing the values of the features that would give me the maximum label, within the boundaries of the dataset?
Upvotes: 0
Views: 445
Reputation: 497
If I understand your question correctly, this should work:
import numpy as np
id_max = np.argmax(grid_search.predict(data)) # find id of the maximum predicted label
print(data.loc[id_max])
Upvotes: 1