Regressor
Regressor

Reputation: 1973

How to load .rds R file as a Spark dataframe in Scala

I am trying to create a production data pipeline for a model . As a part of this pipeline, I save a model which runs in R environment as .rds file. Here is an example -

set.seed(345)

df = data.frame(x = rnorm(20))

df = transform(df , y = 5 + (2.3*x) + rnorm(20))

## model
m1 = lm(y ~ x , data = df)

## Take out the coefficients 
coeff = m1$coefficients

> coeff
(Intercept)           x 
   4.938554    2.328345

## save the model coefficients
saveRDS(coeff, "~/Desktop/coeff.rds")

Now, I would like to somehow load these coefficients in a Scala program as a Spark Dataframe, which might look something like this -

val loadCoefficients = # some method to load .rds file as a Spark Data frame

Is there any library that can allow me to achieve this? My end result in Spark context should look like -

loadCoefficients.show
org.apache.spark.sql.DataFrame
(Intercept)           x 
   4.938554    2.328345

Upvotes: 2

Views: 1849

Answers (1)

check this which might help sparkR the author is doing pretty close to what you are trying,sparkR is a shell which comes with Spark distribution by default, hope this helps.

https://cosminsanda.com/posts/a-compelling-case-for-sparkr/

also check this function in SparkR which can convert R data frame to spark data frame if you can convert coeff value to R dataframe then you can easily convert to spark dataframe.

https://spark.apache.org/docs/2.0.0/api/R/createDataFrame.html

Upvotes: 1

Related Questions