Reputation: 438
We can get the best parameters for a model using RandomizedSearchCV
.
def test_model():
X_train, X_test, y_train, y_test = make_friedman1()
result_dfs = []
model = Ridge()
search = RandomizedSearchCV(model, space, n_iter=500, scoring='neg_mean_absolute_error', n_jobs=-1, cv=cv)
result = search.fit(X_train, y_train)
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)
Now, I am trying to get the test score (i.e, MSE
, R2
) for every different type of parameters combinations using the X_test
data.
def test_model():
X_train, X_test, y_train, y_test = make_friedman1()
result_dfs = []
model = Ridge()
search = RandomizedSearchCV(model, space, n_iter=500, scoring='neg_mean_absolute_error', n_jobs=-1, cv=cv)
result = search.fit(X_train, y_train)
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)
test_result = search.fit(X_train, y_train).predict(X_test)
diff_acc = test_result - y_test
fold_df = pd.DataFrame()
fold_df["MSE"] = [mean_squared_error(y_test, test_result)]
fold_df["R2"] = [r2_score(y_test, test_result)]
result_dfs.append(fold_df)
rep_df = pd.concat(result_dfs, axis=0, ignore_index=True)
return rep_df
The output I am getting is
Best Score: -0.495580216817403
Best Hyperparameters: {'alpha': 28.590361345568553, 'fit_intercept': False, 'normalize': True, 'solver': 'cholesky'}
MSE R2
0 0.460333 0.504366
But I want to get all test scores for all different param configurations from the param space
and save them in a df
.
More specifically, I need something like say, I have n_iter=500
in my program. So, I have 500 combinations of the params settings. I want to use these params in the below line to fit
and predict
. Finally, I will have 500 MSE
and R2
for each different params combination.
test_result = search.fit(X_train, y_train).predict(X_test)
Could you tell me how I can get all test scores for every different combination of the parameters using RandomizedSearchCV
?
Full code
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
import pandas as pd
from scipy.stats import loguniform
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import RandomizedSearchCV
# define search space
space = dict()
space['solver'] = ['svd', 'cholesky', 'lsqr', 'sag']
space['alpha'] = loguniform(1e-5, 100)
space['fit_intercept'] = [True, False]
space['normalize'] = [True, False]
cv = RepeatedKFold(n_splits=5, n_repeats=3)
def generate_friedman1():
data = datasets.make_friedman1(n_samples=300)
X = data[0]
y = data[1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
return X_train, X_test, y_train, y_test
def test_model():
X_train, X_test, y_train, y_test = make_friedman1()
result_dfs = []
model = Ridge()
search = RandomizedSearchCV(model, space, n_iter=500, scoring='neg_mean_absolute_error', n_jobs=-1, cv=cv)
result = search.fit(X_train, y_train)
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)
test_result = search.fit(X_train, y_train).predict(X_test)
diff_acc = test_result - y_test
fold_df = pd.DataFrame()
fold_df["MSqE"] = [mean_squared_error(y_test, test_result)]
fold_df["R2"] = [r2_score(y_test, test_result)]
result_dfs.append(fold_df)
rep_df = pd.concat(result_dfs, axis=0, ignore_index=True)
return rep_df
if __name__ == "__main__":
print(test_model())
Upvotes: 1
Views: 742
Reputation: 755
You can save all params in a variable
all_param_combination = search.cv_results_['params']
Then you can use a loop to fit
and predict
using a model
for i in range(len(all_param_combination)):
reg_preds = Ridge(**all_param_combination[i]).fit(X_train, y_train).predict(X_test)
acc_diff = reg_preds - y_test
fold_df = pd.DataFrame()
fold_df["MSE"] = [mean_squared_error(y_test, reg_preds)]
fold_df["R2"] = [r2_score(y_test, reg_preds)]
fold_dfs.append(fold_df)
rep_df = pd.concat(fold_dfs, axis=0, ignore_index=True)
Upvotes: 1
Reputation: 1482
The attribute .cv_results_
will have the results of each cv fold and each parameter tested. For example, search.cv_results_['params']
will hold a dictionary of all values tested in the randomized search and search.cv_results_['split0_test_score']
will hold the scores it got for split0.
If you need further help, please specify the columns of the DataFrame you'd like to see and I can assist if needed!
Upvotes: 0