Jibin Mathew
Jibin Mathew

Reputation: 5102

Is there a need to normalise input vector for prediction in SVM?

For input data of different scale I understand that the values used to train the classifier has to be normalized for correct classification(SVM).

So does the input vector for prediction also needs to be normalized?

The scenario that I have is that the training data is normalized and serialized and saved in the database, when a prediction has to be done the serialized data is deserialized to get the normalized numpy array, and the numpy array is then fit on the classifier and the input vector for prediction is applied for prediction. So does this input vector also needs to be normalized? If so how to do it, since at the time of prediction I don't have the actual input training data to normalize?

Also I am normalizing along axis=0 , i.e. along the column.

my code for normalizing is :

preprocessing.normalize(data, norm='l2',axis=0)

is there a way to serialize preprocessing.normalize

Upvotes: 4

Views: 5819

Answers (1)

Rob
Rob

Reputation: 1131

In SVMs it is recommended a scaler for several reasons.

  • It is better to have the same scale in many optimization methods.
  • Many kernel functions use internally an euclidean distance to compare two different samples (in the gaussian kernel the euclidean distance is in the exponential term), if every feature has a different scale, the euclidean distance only take into account the features with highest scale.

When you put the features in the same scale you must remove the mean and divide by the standard deviation.

        xi - mi
xi -> ------------
         sigmai

You must storage the mean and standard deviation of every feature in the training set to use the same operations in future data.

In python you have functions to do that for you:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

To obtain means and standar deviations:

scaler = preprocessing.StandardScaler().fit(X)

To normalize then the training set (X is a matrix where every row is a data and every column a feature):

X = scaler.transform(X)

After the training, you must normalize of future data before the classification:

newData = scaler.transform(newData)

Upvotes: 6

Related Questions