Opps_0
Opps_0

Reputation: 438

How to use 'predict' at the time of Random Search for all different param combination

We can get the best parameters for a model using RandomizedSearchCV.

def test_model():
    X_train, X_test, y_train, y_test = make_friedman1()
    result_dfs = []
    model = Ridge()
    search = RandomizedSearchCV(model, space, n_iter=500, scoring='neg_mean_absolute_error', n_jobs=-1, cv=cv)
    result = search.fit(X_train, y_train)
    print('Best Score: %s' % result.best_score_)
    print('Best Hyperparameters: %s' % result.best_params_)

Now, I am trying to get the test score (i.e, MSE, R2) for every different type of parameters combinations using the X_test data.

def test_model():
    X_train, X_test, y_train, y_test = make_friedman1()
    result_dfs = []
    model = Ridge()
    search = RandomizedSearchCV(model, space, n_iter=500, scoring='neg_mean_absolute_error', n_jobs=-1, cv=cv)
    result = search.fit(X_train, y_train)
    print('Best Score: %s' % result.best_score_)
    print('Best Hyperparameters: %s' % result.best_params_)
    
    test_result = search.fit(X_train, y_train).predict(X_test)
    diff_acc = test_result - y_test
    fold_df = pd.DataFrame()
    fold_df["MSE"] = [mean_squared_error(y_test, test_result)]
    fold_df["R2"] = [r2_score(y_test, test_result)]
    result_dfs.append(fold_df)
    rep_df = pd.concat(result_dfs, axis=0, ignore_index=True)
    return rep_df

The output I am getting is

Best Score: -0.495580216817403
Best Hyperparameters: {'alpha': 28.590361345568553, 'fit_intercept': False, 'normalize': True, 'solver': 'cholesky'}
       MSE       R2            
0  0.460333  0.504366  

But I want to get all test scores for all different param configurations from the param space and save them in a df.

More specifically, I need something like say, I have n_iter=500 in my program. So, I have 500 combinations of the params settings. I want to use these params in the below line to fit and predict. Finally, I will have 500 MSE and R2 for each different params combination.

test_result = search.fit(X_train, y_train).predict(X_test)

Could you tell me how I can get all test scores for every different combination of the parameters using RandomizedSearchCV?

Full code

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
import pandas as pd
from scipy.stats import loguniform
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import RandomizedSearchCV


# define search space
space = dict()
space['solver'] = ['svd', 'cholesky', 'lsqr', 'sag']
space['alpha'] = loguniform(1e-5, 100)
space['fit_intercept'] = [True, False]
space['normalize'] = [True, False]

cv = RepeatedKFold(n_splits=5, n_repeats=3)

def generate_friedman1():
    data = datasets.make_friedman1(n_samples=300)
    X = data[0]
    y = data[1]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    return X_train, X_test, y_train, y_test

def test_model():
    X_train, X_test, y_train, y_test = make_friedman1()
    result_dfs = []
    model = Ridge()
    search = RandomizedSearchCV(model, space, n_iter=500, scoring='neg_mean_absolute_error', n_jobs=-1, cv=cv)
    result = search.fit(X_train, y_train)
    print('Best Score: %s' % result.best_score_)
    print('Best Hyperparameters: %s' % result.best_params_)
    
    test_result = search.fit(X_train, y_train).predict(X_test)
    diff_acc = test_result - y_test
    fold_df = pd.DataFrame()
    fold_df["MSqE"] = [mean_squared_error(y_test, test_result)]
    fold_df["R2"] = [r2_score(y_test, test_result)]
    result_dfs.append(fold_df)
    rep_df = pd.concat(result_dfs, axis=0, ignore_index=True)
    return rep_df

if __name__ == "__main__":
    print(test_model())

Upvotes: 1

Views: 742

Answers (2)

0Knowledge
0Knowledge

Reputation: 755

You can save all params in a variable

all_param_combination = search.cv_results_['params']

Then you can use a loop to fit and predict using a model

    for i in range(len(all_param_combination)):
        reg_preds = Ridge(**all_param_combination[i]).fit(X_train, y_train).predict(X_test)
        acc_diff = reg_preds - y_test
        fold_df = pd.DataFrame()
        fold_df["MSE"] = [mean_squared_error(y_test, reg_preds)]
        fold_df["R2"] = [r2_score(y_test, reg_preds)]
        fold_dfs.append(fold_df)
    rep_df = pd.concat(fold_dfs, axis=0, ignore_index=True)

Upvotes: 1

TC Arlen
TC Arlen

Reputation: 1482

The attribute .cv_results_ will have the results of each cv fold and each parameter tested. For example, search.cv_results_['params'] will hold a dictionary of all values tested in the randomized search and search.cv_results_['split0_test_score'] will hold the scores it got for split0.

If you need further help, please specify the columns of the DataFrame you'd like to see and I can assist if needed!

Upvotes: 0

Related Questions