Reputation: 343
I want to run my existing R script from Spark.
I have setup R and Spark on my machine and trying to execute the code but i am getting exception but that is not very helpful.
Spark Code-
String file = "/home/MSA2.R";
SparkConf sparkConf = new SparkConf().setAppName("First App")
.setMaster("local[1]");
@SuppressWarnings("resource")
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaRDD<String> rdd = sparkContext.textFile("/home/test.csv")
.pipe(file);
R code -
f1 <- read.csv("/home/testing.csv")
Exception -
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalStateException: Subprocess exited with status 2. Command ran: /home/MSA2.R java.util.NoSuchElementException: key not found: 1 rg.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 >seconds. This timeout is controlled by spark.rpc.askTimeout at >org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTime>out$$createRpcTimeoutException(RpcTimeout.scala:48)
There is not much in exception to debug the issue.
Can anyone suggest if the approach is correct or not. If yes can anyone help with the issue, If no, please suggest an approach.
Note: I don't want to use Spark R
Reference of above code- https://www.linkedin.com/pulse/executing-existing-r-scripts-from-spark-rutger-de-graaf
Upvotes: 0
Views: 1033
Reputation: 343
I have fixed the issue. I have added
#!/usr/bin/Rscript
on the first line of the RScript and it worked.
Upvotes: 1
Reputation: 20810
Actual error is :
java.lang.IllegalStateException: Subprocess exited with status 2. Command ran: /home/MSA2.R
Make sure, MSA2.R exists in the given location and in the same cluster where you are running spark jobs.
Generally exit status 2 occurs when script is not able to access the device.
Upvotes: 1