Prem
Prem

Reputation: 15

How to configure kernel selection and loss function for Support Vector Machines in Spark MLLib

I have installed spark on AWS Elastic Map Reduce(EMR) and have been running SVM using the packages in MLLib. But there are no options to choose parameters for building the model like kernel selection and cost of misclassification (Like in e1071 package of R). Can someone please tell me how to set these parameters while building the model?

Upvotes: 1

Views: 571

Answers (2)

Kyle.
Kyle.

Reputation: 156

MLLib's implementation of SVM is limited to linear kernels, so you're not going to find anything related to kernels. There is some work related to this happening, though, for example Pegasos.

Upvotes: 0

WestCoastProjects
WestCoastProjects

Reputation: 63082

Summary / TL;DR:

The hardcoded methods for SVMWithSGD are:

private val gradient = new HingeGradient()
private val updater new SquaredL2Updater()

Since these are hard-coded - you can not configure them the way you are used to in R.

Details:

At the "bare metal" level the mllib SVMWithSGD supports the following parameters:

  • Weights computed for every feature.
  • Intercept computed for this model.
  • Threshold between positive/negative predictions (defaults to 0.0)

There are other convenience methods that allow you to define:

  • regularization type (L1 vs L2)
  • regularization parameter (lambda)
  • what fraction of the input data to use for each training batch
  • initial step size (for the gradient descent)

You will notice that the two items you mention:

  • kernel selection
  • cost of misclassification

are not included in those configurable parameters

Under the covers these are defined by the invocation of GradientDescent class as follows:

* @param gradient Gradient function to be used.
* @param updater Updater to be used to update weights after every iteration.
GradientDescent(gradient: Gradient, private var updater: Updater)

with the following settings

Upvotes: 1

Related Questions