Reputation: 93
i have a Hadoop FileSystem
which is using native libraries with JNI.
Apparently i have to include the shared object independently of the currently executed job. But i can't find a way to tell Hadoop/Yarn where it should look for the shared object.
I had partial success with the following solutions, while starting the wordcount example with yarn.
Setting export JAVA_LIBRARY_PATH=/path
when starting the resource- and the nodemanager.
This helps with with the resource and the nodemanager, but the actual Job/Application fails. Printing the LD_LIBRARY_PATH
and the java.library.path
while executing the wordcount example yield the following result. What
/logs/userlogs/application_x/container_x_001/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x
Setting yarn.app.mapreduce.am.env="LD_LIBRARY_PATH=/path"
This did help with some of the Jobs. The actual map/reduce job did work (at least i have the correct results), but the call did fail with the Error no jni-xtreemfs in java.library.path
.
Somehow the first application/job did work and shows
/logs/userlogs/application_x/container_x_001/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/path:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/path
But the second and the rest did fail with:
/logs/userlogs/application_x/container_x_002/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_002:/opt/hadoop-2.7.1/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_002/opt/hadoop-2.7.1/lib/native
The stacktrace for the later shows, that the error occured while executing YarnChild
:
2015-08-03 15:24:03,851 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.UnsatisfiedLinkError: no jni-xtreemfs in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at org.xtreemfs.common.libxtreemfs.jni.NativeHelper.loadLibrary(NativeHelper.java:54)
at org.xtreemfs.common.libxtreemfs.jni.NativeClient.<clinit>(NativeClient.java:41)
at org.xtreemfs.common.libxtreemfs.ClientFactory.createClient(ClientFactory.java:72)
at org.xtreemfs.common.libxtreemfs.ClientFactory.createClient(ClientFactory.java:51)
at org.xtreemfs.common.clients.hadoop.XtreemFSFileSystem.initialize(XtreemFSFileSystem.java:191)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Supply the libjni-xtreemfs.so
via the commandline argument -files
This does work. I assume the .so is copied to the tmp directory. But this is no feasible solution, because it would require the users to supply the path to the .so on every call.
Does anybody now how i can globally set the LD_LIBRARY_PATH
or the java.library.path
or can suggest which configuration options i did probably miss? I'd be very thankful!
Upvotes: 2
Views: 8353
Reputation: 681
Use mapreduce.map.env
in your job or site configuration.
Usage is as follows:
<property>
<name>mapreduce.map.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/my/libs</value>
</property>
Note:
Hadoop docs encourage the use of mapreduce.map.env
for this over mapred.child.java.opts
. "Usage of -Djava.library.path can cause programs to no longer function if hadoop native libraries are used."
Upvotes: 2
Reputation: 3686
Short Answer: in your mapred-site.xml put the following
<property>
<name>mapred.child.java.opts</name>
<value>-Djava.library.path=$PATH_TO_NATIVE_LIBS</value>
</property>
Explanation: The Job/Applications aren't executed by yarn rather than by a mapred (map/reduce) container, whoose configuration is controlled by the mapred-site.xml file. Specifying custom java parameters there causes that actual workers to spin with the correct path
Upvotes: 5