MrRobot
MrRobot

Reputation: 501

How to run Spark Streaming application on Windows 10?

I run a Spark Streaming application on MS Windows 10 64-bit that stores data in MongoDB using spark-mongo-connector.

Whenever I run the Spark application, even pyspark I get the following exception:

Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-

Full stack trace:

Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 32 more

I use Hadoop 3.0.0 alpha1 that I installed myself locally with HADOOP_HOME environment variable pointing to the path to Hadoop dir and %HADOOP_HOME%\bin in the PATH environment variable.

So I tried to do the following:

> hdfs dfs -ls /tmp
Found 1 items
drw-rw-rw-   -          0 2016-12-26 16:08 /tmp/hive

I tried to change the permissions as follows:

hdfs dfs -chmod 777 /tmp/hive

but this command outputs:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

I seem to be missing Hadoop's native library for my OS, which after looking it up also appears that i need to recomplie the libhadoop.so.1.0.0 for 64bit platform.

Where can I find the native library for Windows 10 64-bit? Or is there another way of solving this ? aprt from the library ?

Upvotes: 0

Views: 1614

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74669

First of all, you don't have to install Hadoop to use Spark, including Spark Streaming module with or without MongoDB.

Since you're on Windows there is the known issue with NTFS' POSIX-incompatibility so you have to have winutils.exe in PATH since Spark does use Hadoop jars under the covers (for file system access). You can download winutils.exe from https://github.com/steveloughran/winutils. Download one from hadoop-2.7.1 if you don't know which version you should use (but it should really reflect the version of Hadoop your Spark Streaming was built with, e.g. Hadoop 2.7.x for Spark 2.0.2).

Create c:/tmp/hive directory and execute the following as admin (aka Run As Administrator):

winutils.exe chmod -R 777 \tmp\hive

PROTIP Read Problems running Hadoop on Windows for the Apache Hadoop project's official answer.

The message below is harmless and you can safely disregard it.

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform

Upvotes: 1

Related Questions