Reputation: 149
I don't understand the difference between running a R file using Rscript vs spark-submit.
In the file I pass the options to connnect to the cluster so I don't know what is the adventage of using spark-submit.
sparkR.session(master = "spark://...", appName = "test", sparkConfig = list(spark.driver.memory = "1g", spark.driver.cores = 1L, spark.executor.memory = "2g", spark.cores.max = 2L))
What I do in the R program after creating the spark session is querying a parquet file stored in HDFS using SQL.
I tried both ways of running my program and they do exactly the same thing I think.
Thanks in advance
Upvotes: 1
Views: 1564
Reputation: 35229
Using spark-submit
allows to you to set a lot of Spark specific options including, but not limited, to master URI, deploy mode, memory, cores, configuration options, jars, packages and so on.
Most of these can set using Spark configuration or hard coded in the script, but spark-submit
offers more flexibility.
The same applies to other supported languages (Java, Python, Scala).
Upvotes: 1