Reputation: 1122
I am using sklearn.svm.SVR
for a "regression task" which I want to use my "customized kernel method". Here is the dataset samples and the code:
index density speed label
0 14 58.844020 77.179139
1 29 67.624946 78.367394
2 44 77.679100 79.143744
3 59 79.361877 70.048869
4 74 72.529289 74.499239
.... and so on
from sklearn import svm
import pandas as pd
import numpy as np
density = np.random.randint(0,100, size=(3000, 1))
speed = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))
label = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))
d = np.hstack((a,b,c))
data = pd.DataFrame(d, columns=['density', 'speed', 'label'])
data.density = data.density.astype(dtype=np.int32)
def my_kernel(X,Y):
return np.dot(X,X.T)
svr = svm.SVR(kernel=my_kernel)
x = data[['density', 'speed']].iloc[:2000]
y = data['label'].iloc[:2000]
x_t = data[['density', 'speed']].iloc[2000:3000]
y_t = data['label'].iloc[2000:3000]
svr.fit(x,y)
y_preds = svr.predict(x_t)
the problem happens in the last line svm.predict
which says:
X.shape[1] = 1000 should be equal to 2000, the number of samples at training time
I searched the web to find a way to deal with the problem but many questions alike (like {1}, {2}, {3}) were left unanswered.
Actually, I had used SVM methods with rbf
, sigmoid
, ... before and the code was working just fine but this was my first time using customized kernels and I suspected that it must be the reason why this error happened.
So after a little research and reading documentation I found out that when using precomputed
kernels, the shape of the matrix for SVR.predict()
must be like [n_samples_test, n_samples_train]
shape.
I wonder how to modify x_test
in order to get predictions and everything works just fine with no problem like when we don't use customized kernels?
If possible please describe "the reason that why the inputs for svm.predict
function in precomputed
kernel differentiates with the other kernels".
I really hope the unanswered questions that are related to this issue could be answered respectively.
Upvotes: 2
Views: 347
Reputation: 2201
The problem is in your kernel function, it doesn't do the job.
As the documentation https://scikit-learn.org/stable/modules/svm.html#using-python-functions-as-kernels says, "Your kernel must take as arguments two matrices of shape (n_samples_1, n_features)
, (n_samples_2, n_features)
and return a kernel matrix of shape (n_samples_1, n_samples_2)
." The sample kernel on the same page satisfies this criteria:
def my_kernel(X, Y):
return np.dot(X, Y.T)
In your function the second argument of dot
is X.T
and thus the output will have shape (n_samples_1, n_samples_1)
which is not that is expected.
Upvotes: 2
Reputation: 67
The shape does not match means the test data and train data are of not equal shape, always think about matrix or array in numpy. If you are doing any arithmetic operation you always need a similar shape. That's why we check array.shape. [n_samples_test, n_samples_train] you can modify shapes but its not best idea.
array.shape, reshape, resize are used for that
Upvotes: 0