Improving performance calculating Kernel Matrix

Question

I have the following code:

import numpy as np
from sklearn import svm
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from functools import partial
import pandas as pd

def tanimotoKernel(xs, ys):
    a = 0
    b = 0
    for x, y in zip(xs, ys):
        a += min(x, y)
        b += max(x, y)
    return a / b

#gammaExp = 1/(np.exp(gamma) - 1), calculated outside the kernel
def tanimotoLambdaKernel(xs,ys, gamma, gammaExp):
    return np.exp(gamma * tanimotoKernel(xs,ys) - 1) * gammaExp

class GramBuilder:
    def __init__(self, Kernel):
        self._Kernel = Kernel
    def generateMatrixBuilder(self, X1, X2):
        gram_matrix = np.zeros((X1.shape[0], X2.shape[0]))
        for i, x1 in enumerate(X1):
            for j, x2 in enumerate(X2):
                gram_matrix[i, j] = self._Kernel(x1, x2)
        return gram_matrix

gammaList = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100]
CList = [0.001, 0.01, 0.1, 1, 10, 100]


X, y = datasets.load_digits(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(X, y)

svc_list = [
    (svm.SVC(
        kernel=GramBuilder(
            partial(tanimotoLambdaKernel, gamma = x, gammaExp = 1/(np.exp(x) - 1)))
        .generateMatrixBuilder), 
     x)
    for x in gammaList
]

gammas   = []
Cs       = []
accuracy = []
for svc, gamma in svc_list:
    print("Training gamma ", gamma)
    clf = GridSearchCV(svc, {'C' : CList}, verbose = 1, n_jobs = -1)
    clf.fit(x_train, y_train)
    gammas.append(gamma)
    Cs.append(clf.best_params_['C'])
    accuracy.append(clf.best_score_)

For this toy dataset, I have to wait 50 minutes approx to perform all the cross validations in the loop.

The first improvement I did was to calculate gammaExp outside the function, so I can save millions of exponentials. Also I multiplication is faster than division, so I calculated the inverse of the exponential minus one to also try to save more time.

With those modifications I improved a lot the time training the models, however I need it to be faster, so I would appreciate any ideas. Thanks.

Improving performance calculating Kernel Matrix

Answers (1)

Related Questions