Reputation: 485
I am started learning Gaussian regression using Sklearn library using my own data points as given below. though I got the result it is inaccurate because I did not do hyperparameter optimisation. I did some couple of google search and written gridsearch
code. But the code is not running as expected. I don't know where I made my mistake, please help and thanks in advance.
The sample of input and output data is given as follows
X_tr= [10.8204 7.67418 7.83013 8.30996 8.1567 6.94831 14.8673 7.69338 7.67702 12.7542 11.847]
y_tr= [1965.21 854.386 909.126 1094.06 1012.6 607.299 2294.55 866.316 822.948 2255.32 2124.67]
X_te= [7.62022 13.1943 7.76752 8.36949 7.86459 7.16032 12.7035 8.99822 6.32853 9.22345 11.4751]
X_tr, y_tr
and X_te
are the training data points and are reshape values and have a type of 'Array of float64'
Here my grid search code
from sklearn.model_selection import GridSearchCV
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000]},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
scores = ['precision', 'recall']
for score in scores:
print("# Tuning hyper-parameters for %s" % score)
print()
clf = GridSearchCV(
gp(), tuned_parameters, scoring='%s_macro' % score
)
clf.fit(X_tr, y_tr)
Here is a sample of my code without hyperparameter optimisation:
import sklearn.gaussian_process as gp
kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))
model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)
X_tr=np.array([X_tr])
X_te=np.array([X_te])
y_tr=np.array([y_tr])
model.fit(X_tr, y_tr)
params = model.kernel_.get_params()
X_te = X_te.reshape(-1,1)
y_pred, std = model.predict(X_te, return_std=True)
Upvotes: 3
Views: 8062
Reputation: 932
There were a few issues in that code snippet you provided, the one below is a working example:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.gaussian_process.kernels import RBF, DotProduct
import numpy as np
X_tr = np.array([10.8204, 7.67418, 7.83013, 8.30996, 8.1567, 6.94831, 14.8673, 7.69338, 7.67702, 12.7542, 11.847])
y_tr = np.array([1965.21, 854.386, 909.126, 1094.06, 1012.6, 607.299, 2294.55, 866.316, 822.948, 2255.32, 2124.67])
X_te = np.array([7.62022, 13.1943, 7.76752, 8.36949, 7.86459, 7.16032, 12.7035, 8.99822, 6.32853, 9.22345, 11.4751])
param_grid = [{
"alpha": [1e-2, 1e-3],
"kernel": [RBF(l) for l in np.logspace(-1, 1, 2)]
}, {
"alpha": [1e-2, 1e-3],
"kernel": [DotProduct(sigma_0) for sigma_0 in np.logspace(-1, 1, 2)]
}]
# scores for regression
scores = ['explained_variance', 'r2']
gp = GaussianProcessRegressor()
for score in scores:
print("# Tuning hyper-parameters for %s" % score)
print()
clf = GridSearchCV(estimator=gp, param_grid=param_grid, cv=4,
scoring='%s' % score)
clf.fit(X_tr.reshape(-1, 1), y_tr)
print(clf.best_params_)
I would like to break it down now in order to provide some explanation. The first part is data. You will need more data (presumably you only gave a sample here) but you will also need to rescale it for the gaussian process to work efficiently.
The second part is the param_grid
. The parameter grid can be a dictionary or a list of dictionaries. I used a list of dictionaries as it appears that you are interested in testing the performance of different kernels. The granularity of the parameter grid is very low when you add more data I would recommend to increase the granularity by adding more test variables for alpha
and increasing the np.logpspace
steps as well as bounds.
The third part is the scores to test. In the snippet above you had scores for classification algorithms I used scores for regression as you are interested in regression.
The fourth part runs the model. It should print the best parameters for each score. I couldn't get any reliable fits because the dataset was really limited. Note the reshape of the X_tr
input as it's one-dimensional.
Upvotes: 6