How to set a custom loss function in Spark MLlib

I would like to use my own loss function instead of the squared loss for the linear regression model in spark MLlib. So far can't find any part in the documentation that mentions if it is even possible.

Upvotes: 8

Answers (1)

Iman Mirzadeh

Reputation: 13570

TLDR; it is not easy to use a custom loss function because you can not simply pass a loss function to spark models. However, you can easily write a customized model for yourself.

Long answer:
If you look at the code of LinearRegressionWithSGD you will see:

class LinearRegressionWithSGD private[mllib] (
    private var stepSize: Double,
    private var numIterations: Int,
    private var regParam: Double,
    private var miniBatchFraction: Double)
  extends GeneralizedLinearAlgorithm[LinearRegressionModel] with Serializable {

  private val gradient = new LeastSquaresGradient() #Loss Function
  private val updater = new SimpleUpdater()
  @Since("0.8.0")
  override val optimizer = new GradientDescent(gradient, updater) #Optimizer
    .setStepSize(stepSize)
    .setNumIterations(numIterations)
    .setRegParam(regParam)
    .setMiniBatchFraction(miniBatchFraction)

So, let's look at how the least squared loss function is implemented here:

class LeastSquaresGradient extends Gradient {
  override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = {
    val diff = dot(data, weights) - label
    val loss = diff * diff / 2.0
    val gradient = data.copy
    scal(diff, gradient)
    (gradient, loss)
  }

  override def compute(
      data: Vector,
      label: Double,
      weights: Vector,
      cumGradient: Vector): Double = {
    val diff = dot(data, weights) - label
    axpy(diff, data, cumGradient)
    diff * diff / 2.0
  }
}

So, you can simply write a class like LeastSquaresGradient and implement the compute function and use it in your LinearRegressionWithSGD model.

Upvotes: 2

How to set a custom loss function in Spark MLlib

Answers (1)

Related Questions