Vishal Kolki
Vishal Kolki

Reputation: 11

Error: could not find function "includePackage"

I am trying to execute Random Forest algorithm on SparkR, with Spark 1.5.1 installed. I am not getting clear idea, why i am getting the error -

  Error: could not find function "includePackage"

Further even if I use mapPartitions function in my code , i get the error saying -

  Error: could not find function "mapPartitions"

Please find the below code:

rdd <- SparkR:::textFile(sc, "http://localhost:50070/explorer.html#/Datasets/Datasets/iris.csv",5) 

includePackage(sc,randomForest) 

rf <- mapPartitions(rdd, function(input) {
 ## my function code for RF
}

Upvotes: 0

Views: 1269

Answers (2)

hbabbar
hbabbar

Reputation: 967

This is more of a comment and a cross question rather than an answer (not allowed to comment because of the reputation) but just to take this further, if we are using the collect method to convert the rdd back to an R dataframe, isnt that counter productive as if the data is too large, it would take too long to execute in R.

Also does it mean that we could possibly use any R package say, markovChain or a neuralnet using the same methodology.

Upvotes: 1

Arun Gunalan
Arun Gunalan

Reputation: 824

Kindly check the functions that can be in used in sparkR http://spark.apache.org/docs/latest/api/R/index.html This doesn't include function mapPartitions() or includePackage()

#For reading csv in sparkR

sparkRdf <- read.df(sqlContext, "./nycflights13.csv", 
                    "com.databricks.spark.csv", header="true")

#Possible way to use `randomForest` is to convert the `sparkR` data frame to `R` data frame
Rdf <- collect(sparkRdf) 

#compute as usual in `R` code
>install.packages("randomForest") 
>library(rainForest)
......
#convert back to sparkRdf 
sparkRdf <- createDataFrame(sqlContext, Rdf) 

Upvotes: 0

Related Questions