Reputation: 401
I have the following training data:
x = [
[0.914728682,5.217,5,0.217,3.150362319,33.36,35,-1.64,4.220113852],
[0.885057471,7.793,8,-0.207,3.380911063,46.84,48,-1.16,4.448243115],
[0.871345029,7.152,7,0.152,3.976205037,44.98,47,-2.02,5.421236592],
[0.821428571,8.04,8,0.04,2.909880565,52.02,54.5,-2.48,2.824104235],
[0.931372549,8.01,8,0.01,4.616714697,48.04,48,0.04,9.650462033],
[0.66367713,5.424,5.5,-0.076,1.37804878,32.6,35.5,-2.9,1.189781022],
[0.78,8.66,9,-0.34,2.272965879,48.47,55,-6.53,2.564550265],
[0.227272727,19.55,21,-1.45,1.860133206,128.23,147,-18.77,1.896893491],
[0.47826087,10.09,8,2.09,1.155519927,74.43,64,10.43,1.169547454],
[0.652694611,6.775,4,2.775,1.05529595,43.1,30,13.1,1.062885327],
[0.798561151,3.986,2,1.986,0.656563993,25.38,13,12.38,0.652442159],
[0.666666667,5.419,3,2.419,1.057985162,34.37,16,18.37,0.981719509],
[0.5625,7.719,2,5.719,0.6421797,46.91,12,34.91,0.665673336]
]
and the following labels(scores):
y = [0.237113402,0.168831169,0.104166667,0.086419753,0.063147368,0.016042781,
0.014814815,0,0,-0.0794,-0.14,-0.1832,-0.2385]
It seems clear that the larger the values in column 5 and column 9 are, the higher the scores.
I write the following code that make use of SVR on the training data provided:
rb = RobustScaler()
xScaled = rb.fit_transform(x)
model = SVR(C=1.0, epsilon=0.1)
model.fit(xScaled,y)
But no matter which of the following I use for prediction, it is not giving a score that looks right.
If I do something like the following during training:
xScaled = preprocessing.scale(x)
model = SVR(C=1.0, epsilon=0.1)
model.fit(xScaled,y)
then:
score = svmModel.predict(testData)
I get back something close to the origin y.
But I pick a row in x, put it in a 2d array with one row called testData, and do:
score = svmModel.predict(testData)
I get a wrong score. In fact, no matter which row in x I use for creating the 2d array with one row, I get the same score.
What have I done wrong? I would be extremely grateful if someone can help.
Upvotes: 1
Views: 443
Reputation: 36599
1) score = model.predict(rb.fit_transform(testData))
When you do the above, you are re-fitting the RobustScaler to the new data. That means that it will be scaled to new data and will not match the scales of the training data. So the results will not be good.
2) score = model.predict(testData)
In the above, you are not scaling the test data, so its different that what the SVC has learnt. Hence the results will be bad here also.
What you need to do:-
score = model.predict(rb.transform(testData))
Calling transform()
will scale the supplied data based on training data scales, and hence the SVC can better predict the output.
Upvotes: 1