Random seed on SVM sklearn produces different results

Question

when l run SVM, l get different results even with a fixed random_state=42.

l have 10 classes and a dataset of 200 examples. Dimension of my dataset dim_dataset=(200,2048)

Here is my code:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn import svm
import random
random.seed(42)

def shuffle_data(x,y):
    idx = np.random.permutation(len(x))
    x_data= x[idx]
    y_labels=y[idx]
    return x_data,y_labels

d,l=shuffle_data(dataset,true_labels) # dim_d=(200,2048) , dim_l=(200,)

X_train, X_test, y_train, y_test = train_test_split(d, l, test_size=0.30, random_state=42)

# hist intersection kernel
gramMatrix = histogramIntersection(X_train, X_train)
clf_gram = svm.SVC(kernel='precomputed', random_state=42).fit(gramMatrix, y_train)
predictMatrix = histogramIntersection(X_test, X_train)
SVMResults = clf_gram.predict(predictMatrix)
correct = sum(1.0 * (SVMResults == y_test))
accuracy = correct / len(y_test)
print("SVM (Histogram Intersection): " + str(accuracy) + " (" + str(int(correct)) + "/" + str(len(y_test)) + ")")


# libsvm linear kernel
clf_linear_kernel = svm.SVC(kernel='linear', random_state=42).fit(X_train, y_train)
predicted_linear = clf_linear_kernel.predict(X_test)
correct_linear_libsvm = sum(1.0 * (predicted_linear == y_test))
accuracy_linear_libsvm = correct_linear_libsvm / len(y_test)
print("SVM (linear kernel libsvm): " + str(accuracy_linear_libsvm) + " (" + str(int(correct_linear_libsvm)) + "/" + str(len(y_test)) + ")")

# liblinear linear kernel

clf_linear_kernel_liblinear = LinearSVC(random_state=42).fit(X_train, y_train)
predicted_linear_liblinear = clf_linear_kernel_liblinear.predict(X_test)
correct_linear_liblinear = sum(1.0 * (predicted_linear_liblinear == y_test))
accuracy_linear_liblinear = correct_linear_liblinear / len(y_test)
print("SVM (linear kernel liblinear): " + str(accuracy_linear_liblinear) + " (" + str(
        int(correct_linear_liblinear)) + "/" + str(len(y_test)) + ")")

What's wrong with my code ?

Vivek Kumar · Accepted Answer

Use numpy.random.seed() instead of simple random.seed like this:

np.random.seed(42)

Scikit internally uses numpy to generate random numbers so doing only random.seed will not effect the behaviour of numpy which is still random.

Please see the following links for better understanding:

Random seed on SVM sklearn produces different results

Answers (1)

Related Questions