Sandor Caetano
Sandor Caetano

Reputation: 145

SparkR on Windows - Spark SQL is not built with Hive support

I'm trying to use Spark localy in my machine and I was able to reproduce the tutorial at:

http://blog.sparkiq-labs.com/2015/07/26/installing-and-starting-sparkr-locally-on-windows-os-and-rstudio/

However, when I try to use Hive I get the following error:

Error in value[3L] : Spark SQL is not built with Hive support

The code:

## Set Environment variables
Sys.setenv(SPARK_HOME = 'F:/Spark_build')
# Set the library Path
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R','lib'),.libPaths()))

# load  SparkR
library(SparkR)

sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)

sparkR.stop()

First I suspected that it was the pre-built version of Spark, then I tried to build my own using Maven, which took almost an hour:

mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package.

However, the error persists.

Upvotes: 3

Views: 585

Answers (2)

Nicola Jean
Nicola Jean

Reputation: 1

We had the same problem but we could not simply move to linux. After a while we found this page spark on windows and we came up with the following solution:

  • Create a file named hive-site.xml and write in it:

    <configuration> <property> <name>hive.exec.scratchdir</name> <value>C:\tmp\hive</value> <description>Scratch space for Hive jobs</description> </property> </configuration>

  • Set the environment variable HADOOP_CONF_DIR to the hive-site.xml directory.
  • Set the environment variable HADOOP_HOME as stated at hadoop winutils
  • Run winutils.exe chmod -R 777 C:\tmp\hive

This solved the problem on our windows machine where we can now run SparkR with hive support.

Upvotes: 0

desertnaut
desertnaut

Reputation: 60319

If you just followed the tutorial's instructions, you simply do not have Hive installed (try hive from the command line)... I have found that this is a common point of confusion for Spark beginners: "pre-built for Hadoop" does not mean that it needs Hadoop, let alone that it includes Hadoop (it does not), and the same holds for Hive.

Upvotes: 1

Related Questions