Reputation: 1973
I am trying to create a production data pipeline for a model . As a part of this pipeline, I save a model which runs in R
environment as .rds
file. Here is an example -
set.seed(345)
df = data.frame(x = rnorm(20))
df = transform(df , y = 5 + (2.3*x) + rnorm(20))
## model
m1 = lm(y ~ x , data = df)
## Take out the coefficients
coeff = m1$coefficients
> coeff
(Intercept) x
4.938554 2.328345
## save the model coefficients
saveRDS(coeff, "~/Desktop/coeff.rds")
Now, I would like to somehow load these coefficients
in a Scala program as a Spark Dataframe
, which might look something like this -
val loadCoefficients = # some method to load .rds file as a Spark Data frame
Is there any library that can allow me to achieve this? My end result in Spark context should look like -
loadCoefficients.show
org.apache.spark.sql.DataFrame
(Intercept) x
4.938554 2.328345
Upvotes: 2
Views: 1849
Reputation: 1134
check this which might help sparkR the author is doing pretty close to what you are trying,sparkR is a shell which comes with Spark distribution by default, hope this helps.
https://cosminsanda.com/posts/a-compelling-case-for-sparkr/
also check this function in SparkR which can convert R data frame to spark data frame if you can convert coeff value to R dataframe then you can easily convert to spark dataframe.
https://spark.apache.org/docs/2.0.0/api/R/createDataFrame.html
Upvotes: 1