Reputation: 133
I am using Apache Spark on Windows 10 64 bit machine. I have installed Java, Python 3.6 ,spark-2.3.1-bin-hadoop2.7. I am using VSCode editor for PySpark codeing.
When I'm executing the Python spark code in VSCode using spark-submit, it is showing
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
and is terminating the execution.
Relevant code:
from pyspark import SparkContext, SparkConf
if name == "main":
conf = SparkConf().setAppName("word count").setMaster("local[2]")
sc = SparkContext(conf=conf)
lines = sc.textFile("in/word_count.text")
words = lines.flatMap(lambda line: line.split(" "))
wordcounts = words.countByValue()
for word, count in wordcounts.items():
print("{} : {}".format(word,count))
Spark Execution Error:
Upvotes: 0
Views: 3976
Reputation: 408
You can safely ignore the warning as it is not the reason behind your shutdown call. According to documentation:
The native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform.
The native hadoop library is mainly used on the GNU/Linus platform and has been tested on these distributions:
RHEL4/Fedora Ubuntu Gentoo On all the above distributions a 32/64 bit native hadoop library will work with a respective 32/64 bit jvm.
Upvotes: 3