SparkR MLlib & spark.ml: least squares and glm optimization

Question

Would anyone be able to explain how to specify optimization methods in the SparkR operation glm? When I try to fit an OLS model with glm, I can only specify "normal" or "auto" as the solver type. SparkR isn't able to interpret the solver specification "l-bfgs", leading me to believe that when I do specify "auto", SparkR simply assumes "normal" and then estimates the model coefficients analytically, using the LS normal equation.

Is fitting GLMs with stochastic gradient descent and L-BFGS not available in SparkR, or am I writing the following evaluation incorrectly?

m <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "l-bfgs")

There's plenty of documentation in Spark about using iterative methods to fit GLMs, e.g. LogisticRegressionWithLBFGS and LinearRegressionWithSGD (discussed here), but I haven't been able to find any such documentation for the R API. Is this simply not available in SparkR (i.e. are SparkR users constrained to solving analytically and, therefore, constrained in the size of our data), or am I missing something essential here? If it isn't currently available in SparkR, is it supposed to come out with SparkR 2.0.0?

Below, I create a toy data set and fit three models, each with a different solver specification:

x1 <- rnorm(n=200, mean=10, sd=2)
x2 <- rnorm(n=200, mean=17, sd=3)
x3 <- rnorm(n=200, mean=8, sd=1)
y <- 1 + .2 * x1 + .4 * x2 + .5 * x3 + rnorm(n=200, mean=0, sd=.1)
dat <- cbind.data.frame(y, x1, x2, x3)

df <- as.DataFrame(sqlContext, dat)

m1 <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "normal")
m2 <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "auto")
m3 <- SparkR::glm(y ~ x1 + x2 + x3, data = df, solver = "l-bfgs")

The first and second model result in the same parameter estimation values (supporting my assumption that SparkR is solving the normal equation when fitting both models and, consequently, the models are equivalent). SparkR is able to fit the third model, but when I try to print a summary of the GLM, I receive the following error:

For reference, I am doing this through AWS and have tried different versions of EMR, including the most recent (in case that makes a difference). Also, I am using Spark 1.6.1 (R API).

SparkR MLlib & spark.ml: least squares and glm optimization

Answers (1)

Related Questions

SparkR MLlib &amp; spark.ml: least squares and glm optimization

Answers (1)

Related Questions

SparkR MLlib & spark.ml: least squares and glm optimization