How should I modify the test data for SVM method to be able to use the `precomputed` kernel function without error?

Question

I am using sklearn.svm.SVR for a "regression task" which I want to use my "customized kernel method". Here is the dataset samples and the code:

 index   density     speed        label
 0         14      58.844020    77.179139
 1         29      67.624946    78.367394
 2         44      77.679100    79.143744
 3         59      79.361877    70.048869
 4         74      72.529289    74.499239
 .... and so on

from sklearn import svm
import pandas as pd
import numpy as np

density = np.random.randint(0,100, size=(3000, 1))
speed   = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))
label   = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))

d    = np.hstack((a,b,c))
data = pd.DataFrame(d, columns=['density', 'speed', 'label'])
data.density = data.density.astype(dtype=np.int32)

def my_kernel(X,Y):
    return np.dot(X,X.T)

svr = svm.SVR(kernel=my_kernel)
x = data[['density', 'speed']].iloc[:2000]
y = data['label'].iloc[:2000]
x_t = data[['density', 'speed']].iloc[2000:3000]
y_t = data['label'].iloc[2000:3000]

svr.fit(x,y)
y_preds = svr.predict(x_t)

the problem happens in the last line svm.predict which says:

X.shape[1] = 1000 should be equal to 2000, the number of samples at training time

I searched the web to find a way to deal with the problem but many questions alike (like {1}, {2}, {3}) were left unanswered.

Actually, I had used SVM methods with rbf, sigmoid, ... before and the code was working just fine but this was my first time using customized kernels and I suspected that it must be the reason why this error happened.

So after a little research and reading documentation I found out that when using precomputed kernels, the shape of the matrix for SVR.predict() must be like [n_samples_test, n_samples_train] shape.

I wonder how to modify x_test in order to get predictions and everything works just fine with no problem like when we don't use customized kernels?

If possible please describe "the reason that why the inputs for svm.predict function in precomputed kernel differentiates with the other kernels".

I really hope the unanswered questions that are related to this issue could be answered respectively.

aparpara · Accepted Answer

The problem is in your kernel function, it doesn't do the job.

As the documentation https://scikit-learn.org/stable/modules/svm.html#using-python-functions-as-kernels says, "Your kernel must take as arguments two matrices of shape (n_samples_1, n_features), (n_samples_2, n_features) and return a kernel matrix of shape (n_samples_1, n_samples_2)." The sample kernel on the same page satisfies this criteria:

def my_kernel(X, Y):
    return np.dot(X, Y.T)

In your function the second argument of dot is X.T and thus the output will have shape (n_samples_1, n_samples_1) which is not that is expected.

How should I modify the test data for SVM method to be able to use the `precomputed` kernel function without error?

Answers (2)

Related Questions