Reputation: 700
Intro
I know that the answer in 99% of cases to this error message:
WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Is simply "It's just a warning, don't worry about it", and then sometimes followed by "Just download the libraries, compile them and point HADOOP_HOME to this folder and add $HADOOP_HOME/bin/native to your LD_LIBRARY_PATH"
That's what I did, but I'm still getting the error, and after two days of googling I'm starting to feel there's something really interesting to learn if I manage to fix this, There is currently a strange behaviour that I do not understand, hopefully we can work through this together.
Ok, so here's what's up:
Hadoop finds the native libraries
Running a hadoop checknative -a gives me this:
dds-MacBook-Pro-2:~ Rkey$ hadoop checknative -a
2018-07-15 16:18:25,956 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
2018-07-15 16:18:25,959 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2018-07-15 16:18:25,963 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
Native library checking:
hadoop: true /usr/local/Cellar/hadoop/3.1.0/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
zstd : false
snappy: true /usr/local/lib/libsnappy.1.dylib
lz4: true revision:10301
bzip2: false
openssl: false build does not support openssl.
ISA-L: false libhadoop was built without ISA-L support
2018-07-15 16:18:25,986 INFO util.ExitUtil: Exiting with status 1: ExitException
There are some errors here, which might be the cause, but most importantly for now this line is present:
hadoop: true /usr/local/Cellar/hadoop/3.1.0/lib/native/libhadoop.dylib
When I start my hadoop cluster, this is how it looks:
dds-MacBook-Pro-2:~ Rkey$ hstart
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [dds-MacBook-Pro-2.local]
Starting resourcemanager
Starting nodemanagers
No warnings. I downloaded the hadoop source and built it myself. Before I did that, there where "Cannot find native library"-warnings on starting hadoop.
However, spark does not find the native libraries
This is how it looks when I run pyspark:
dds-MacBook-Pro-2:~ Rkey$ pyspark
Python 3.7.0 (default, Jun 29 2018, 20:13:53)
[Clang 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2018-07-15 16:22:22 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
This is where our old friend makes a re-appearence:
WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I find this very strange, because I know for a fact it uses the same hadoop that I can start on my own without any warnings. There are no other hadoop installations on my computer.
Clarifications
I downloaded the non-hadoop version of apache-spark from their website called "Pre-build with user-provided Apache Hadoop". This was then put in my Cellar-folder just because I did not want to re-link everything.
As for variables, this is my ~/.profile
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home
export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2o_2
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
alias hstart="$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh"
alias hstop="$HADOOP_HOME/sbin/stop-dfs.sh;$HADOOP_HOME/sbin/stop-yarn.sh"
And here are my additions to spark-env.sh:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export LD_LIBRARY_PATH=/usr/local/Cellar/hadoop/3.1.0/lib/native/:$LD_LIBRARY_PATH
This is how the folder /usr/local/Cellar/hadoop/3.1.0/lib/native looks:
The question
How is it that hadoop can start locally without giving a warning that it's missing its libraries, and goes through the checknatives -a command showing that it finds the native libraries, but when the same hadoop is launched through pyspark I'm suddenly given this warning again?
Update 16/7
I recently made a discovery. The standard version of this classic error message looks like this:
WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
This is actually different, as my error message instead says NativeCodeLoader:60, with 60 instead of 62. This points towards my theory that it is not really the hadoop libraries missing, but some native libraries that hadoop is using that are missing. This is why hadoop can launch without warnings, but pyspark that likely tries to use more native libraries from hadoop, launches with warnings.
This is still just a theory, and until I remove all warnings from the checknative -a call I do not know.
Update 15/7
Currently trying to remove the "WARN bzip2.Bzip2Factory:"-warning from hadoop checknative -a, perhaps this might remove the warning when launching pyspark.
Upvotes: 2
Views: 1662
Reputation: 11
I had the same question as yours. In my case, it is because, from macOS X El Capitan, the SIP mechanism in macOS makes the operating system ignore the LD_LIBRARY_PATH/DYLD_LIBRARY_PATH even though you have already added the Hadoop native library to the value of any of these variables (I get this information from https://help.mulesoft.com/s/article/Variables-LD-LIBRARY-PATH-DYLD-LIBRARY-PATH-are-ignored-on-MAC-OS-if-System-Integrity-Protect-SIP-is-enable).
Actually, the NativeCodeLoader warning from Spark can be ignored. However, if you really want to let the warning go away, you can disable the SIP on macOS X, and then make sure to add $HADOOP_HOME/lib/native to LD_LIBRARY_PATH. Then, the spark can find the Hadoop native library correctly.
Upvotes: 1