Reputation: 1
I'm trying to connect R to Spark following the sparklyr
tutorial from RStudio: http://spark.rstudio.com/
But some how, I'm getting a weird error message as below. Does anyone knows how to solve this ?
I have tried to add the C:\Windows\system32
path to the System Variables Path without any success. Thanks for your help.
> library(sparklyr)
> sc <- spark_connect(master = "local")
Error in sparkapi::start_shell(master = master, spark_home = spark_home, :
Failed to launch Spark shell. Ports file does not exist.
Path: C:\Users\Gaud\AppData\Local\rstudio\spark\Cache\spark-1.6.1-bin-hadoop2.6\bin\spark-submit.cmd
Parameters: --jars, "C:\Users\Gaud\Documents\R\win-library\3.3\sparklyr\java\sparklyr.jar", --packages, "com.databricks:spark-csv_2.11:1.3.0","com.amazonaws:aws-java-sdk-pom:1.10.34", sparkr-shell, C:\Users\Gaud\AppData\Local\Temp\RtmpC8MAa8\file322c47ee2a28.out
Upvotes: 0
Views: 2890
Reputation: 934
Based on https://github.com/rstudio/sparklyr/issues/114, the following worked for me:
sc <- spark_connect(master = "local", config = list())
Upvotes: 0
Reputation: 623
Install the latest sparklyr from github repository.
Steps to install sparklyr, if you don't have internet on your server.
Upvotes: 1
Reputation: 540
I had the same problem recently. This bug was discussed at RStudio GitHub sparklyr pages.
Could you please provide your sessionInfo()
results?
Its output sheds the light on the package versions and OS in use.
2 main points which helped me:
spark_install()
devtools::install_github("rstudio/sparklyr")
Check the version of the sparklyr package.
In my case the problem disappeared only after updating to version sparklyr_0.4.11
.
Upvotes: 2
Reputation: 783
First you'll want to make sure you have the most current version of RStudio, if that's what you're using (download and install after closing RStudio from here): https://www.rstudio.com/products/rstudio/download/preview/
library(DBI)
library(lazyeval)
library(dplyr)
library(devtools)
# install_github("rstudio/sparkapi")
library(sparkapi)
# install_github("rstudio/sparklyr")
library(sparklyr)
library(yaml)
library(nycflights13)
# Note: Only perform Spark once
spark_install(version = "1.6.1")
# Connect to Spark through connection
sc <- spark_connect(master = "local")
iris_tbl <- copy_to(sc, iris, "iris", overwrite = TRUE)
flights_tbl <- copy_to(sc, nycflights13::flights, "flights", overwrite = TRUE)
class(flights_tbl)
flights_preview <- DBI::dbGetQuery(sc, "SELECT * FROM flights LIMIT 10")
flights_preview
Will output this in Windows 10:
# year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin
# 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR
# 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA
# 3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK
# 4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK
# 5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA
# 6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR
# 7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR
# 8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA
# 9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK
# 10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA
Upvotes: 0