How is logistic regression parallelized in Spark?

Question

I wouldlike to have some insight about the method used to parallelize the logistic regression in the ML library, I already tried to check the source code but I didn't understand the process.

jamborta · Accepted Answer

Spark uses a so called mini batch gradient descent for regression:

http://ruder.io/optimizing-gradient-descent/index.html#minibatchgradientdescent

In a nutshell, it works like this:

Select a sample of the data
Compute the gradient on each row of the sample
Aggregate the gradient
Back to step 1

The actual optimisation code for Spark is from this line: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L234

How is logistic regression parallelized in Spark?

Answers (1)

Related Questions