How to configure kernel selection and loss function for Support Vector Machines in Spark MLLib

Question

I have installed spark on AWS Elastic Map Reduce(EMR) and have been running SVM using the packages in MLLib. But there are no options to choose parameters for building the model like kernel selection and cost of misclassification (Like in e1071 package of R). Can someone please tell me how to set these parameters while building the model?

WestCoastProjects · Accepted Answer

Summary / TL;DR:

The hardcoded methods for SVMWithSGD are:

private val gradient = new HingeGradient()
private val updater new SquaredL2Updater()

Since these are hard-coded - you can not configure them the way you are used to in R.

Details:

At the "bare metal" level the mllib SVMWithSGD supports the following parameters:

Weights computed for every feature.
Intercept computed for this model.
Threshold between positive/negative predictions (defaults to 0.0)

There are other convenience methods that allow you to define:

regularization type (L1 vs L2)
regularization parameter (lambda)
what fraction of the input data to use for each training batch
initial step size (for the gradient descent)

You will notice that the two items you mention:

kernel selection
cost of misclassification

are not included in those configurable parameters

Under the covers these are defined by the invocation of GradientDescent class as follows:

* @param gradient Gradient function to be used.
* @param updater Updater to be used to update weights after every iteration.
GradientDescent(gradient: Gradient, private var updater: Updater)

with the following settings

How to configure kernel selection and loss function for Support Vector Machines in Spark MLLib

Answers (2)

Related Questions