Tiffany
Tiffany

Reputation: 273

How is logistic regression parallelized in Spark?

I wouldlike to have some insight about the method used to parallelize the logistic regression in the ML library, I already tried to check the source code but I didn't understand the process.

Upvotes: 0

Views: 785

Answers (1)

jamborta
jamborta

Reputation: 5210

Spark uses a so called mini batch gradient descent for regression:

http://ruder.io/optimizing-gradient-descent/index.html#minibatchgradientdescent

In a nutshell, it works like this:

  1. Select a sample of the data
  2. Compute the gradient on each row of the sample
  3. Aggregate the gradient
  4. Back to step 1

The actual optimisation code for Spark is from this line: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L234

Upvotes: 3

Related Questions