Hyperparameter Tuning in Random forest

Question

I was trying Random Forest Algorithm on Boston dataset to predict the house prices medv with the help of sklearn's RandomForestRegressor.In all I tried 3 iterations as below

Iteration 1: Using the model with default hyperparameters

#1. import the class/model
from sklearn.ensemble import RandomForestRegressor
#2. Instantiate the estimator
RFReg = RandomForestRegressor(random_state = 1, n_jobs = -1) 
#3. Fit the model with data aka model training
RFReg.fit(X_train, y_train)

#4. Predict the response for a new observation
y_pred = RFReg.predict(X_test)


y_pred_train = RFReg.predict(X_train)

Results of Iteration 1

{'RMSE Test': 2.9850839211419435, 'RMSE Train': 1.2291604936401441}

Iteration 2: I used RandomizedSearchCV to get optimum values of hyper-parameters

from sklearn.ensemble import RandomForestRegressor
RFReg = RandomForestRegressor(n_estimators = 500, random_state = 1, n_jobs = -1) 

param_grid = { 
    'max_features' : ["auto", "sqrt", "log2"],
    'min_samples_split' : np.linspace(0.1, 1.0, 10),
     'max_depth' : [x for x in range(1,20)]


from sklearn.model_selection import RandomizedSearchCV
CV_rfc = RandomizedSearchCV(estimator=RFReg, param_distributions =param_grid, n_jobs = -1, cv= 10, n_iter = 50)
CV_rfc.fit(X_train, y_train)

So I got the best hyperparameters as follows

CV_rfc.best_params_
#{'min_samples_split': 0.1, 'max_features': 'auto', 'max_depth': 18}
CV_rfc.best_score_
#0.8021713812777814

So I trained a new model with best hyperparameters as below

#1. import the class/model
from sklearn.ensemble import RandomForestRegressor
#2. Instantiate the estimator
RFReg = RandomForestRegressor(n_estimators = 500, random_state = 1, n_jobs = -1, min_samples_split = 0.1, max_features = 'auto', max_depth = 18) 
#3. Fit the model with data aka model training
RFReg.fit(X_train, y_train)

#4. Predict the response for a new observation
y_pred = RFReg.predict(X_test)


y_pred_train = RFReg.predict(X_train)

Results of Iteration 2

{'RMSE Test': 3.2836794902147926, 'RMSE Train': 2.71230367772569}

Iteration 3: I use GridSearchCV to get optimum values of hyper-parameters

from sklearn.ensemble import RandomForestRegressor
RFReg = RandomForestRegressor(n_estimators = 500, random_state = 1, n_jobs = -1) 

param_grid = { 
    'max_features' : ["auto", "sqrt", "log2"],
    'min_samples_split' : np.linspace(0.1, 1.0, 10),
     'max_depth' : [x for x in range(1,20)]

}

from sklearn.model_selection import GridSearchCV
CV_rfc = GridSearchCV(estimator=RFReg, param_grid=param_grid, cv= 10, n_jobs = -1)
CV_rfc.fit(X_train, y_train)

So I got the best hyperparameters as follows

CV_rfc.best_params_
#{'max_depth': 12, 'max_features': 'auto', 'min_samples_split': 0.1}
CV_rfc.best_score_
#0.8021820114800677

Results of Iteration 3

{'RMSE Test': 3.283690568225705, 'RMSE Train': 2.712331014201783}

My Function to evaluate RMSE

def model_evaluate(y_train, y_test, y_pred, y_pred_train):
    metrics = {}
    #RMSE Test
    rmse_test = np.sqrt(mean_squared_error(y_test, y_pred))
    #RMSE Train
    rmse_train = np.sqrt(mean_squared_error(y_train, y_pred_train))

    metrics = {
              'RMSE Test': rmse_test,
              'RMSE Train': rmse_train}

    return metrics

So I had below questions after 3 iterations

Why are the results of tuned model worst than the model with default parameters even when I am using RandomSearchCV and GridSearchCV. Ideally the model should give good results when tuned with cross-validation
I know that cross-validation will take place only for the combination of values present in param_grid.There could be values which are good but not included in my param_grid. So how do I deal with this kind of situation
How do I decide what range of values I should try for max_features, min_samples_split, max_depth or for that matter any hyper-parameters in a machine learning model to increase its accuracy.(So that I can at least get a better tuned model than the model with default hyper-parameters)

Hyperparameter Tuning in Random forest

Answers (1)

Related Questions