Reputation: 1060
I'm new to Cloudera, and am attempting to move workloads from a HDP server running Ambari with Livy and Spark 2.2.x to a CDH 5 server with a similar setup. As Livy is not a component of Cloudera, I'm using version 0.5.0-incubating from their website, running it on one of the same servers as the YARN, Spark and HDFS masters.
To keep a very, very long story short, when I try to submit to Livy, I get this error message:
Diagnostics: File file:/home/livy/livy-0.5.0-incubating-bin/rsc-jars/livy-rsc-0.5.0-incubating.jar does not exist
java.io.FileNotFoundException: File file:/home/livy/livy-0.5.0-incubating-bin/rsc-jars/livy-rsc-0.5.0-incubating.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:598)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application.
The jar it's referencing is part of the Livy installation, and obviously exists. It looks like at some point in the process, Hadoop is looking for a file with the URL file:/home...
instead of just /home...
or file:///home...
, but I'm not sure that that's even relevant, as this may be a valid path for HDFS. I've gone as far as building multiple versions of Livy from source, modifying the launch script and remote debugging it, but this error seems to be occurring somewhere in Spark.
Here is my livy.conf
file:
# What spark master Livy sessions should use.
livy.spark.master = yarn
# What spark deploy mode Livy sessions should use.
livy.spark.deploy-mode = cluster
livy.file.upload.max.size 300000000
And livy-env.sh
:
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/etc/hadoop
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop
The old cluster used Hadoop 2.7.3.2.6.5.0-141 and Spark 2.2.1. The new cluster is running Hadoop 2.6.0-cdh5.14.2 and Spark 2.2.0.cloudera2. Using the old cluster's Livy distro as well as Cloudera's own Livy distribution all gave the same basic error. Again, all this stuff worked just fine on the previous HDP/Ambari cluster.
All of those jar files exist on that path on every node, and I've also tried this with the jars in HDFS--Livy extracts them and then gives the same error message for the extracted jars. I also tried a bunch of stuff with permissions but none of it seems to work. For example, I get:
18/06/09 00:13:12 INFO util.LineBufferedStream: (stdout: ,18/06/09 00:13:11 INFO yarn.Client: Uploading resource hdfs://some-server:8020/user/livy/jars/livy-examples-0.4.0-SNAPSHOT.jar -> file:/home/livy/.spar
kStaging/application_1528398117244_0054/livy-examples-0.4.0-SNAPSHOT.jar)
from Livy's output, followed by...
Diagnostics: File file:/home/livy/.sparkStaging/application_1528398117244_0054/livy-examples-0.4.0-SNAPSHOT.jar does not exist
java.io.FileNotFoundException: File file:/home/livy/.sparkStaging/application_1528398117244_0054/livy-examples-0.4.0-SNAPSHOT.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:598)
...
from YARN's inevitable failure.
Anyone have any thoughts? Would be happy to even just hear alternatives to Livy, if there are any...
Upvotes: 1
Views: 1454
Reputation: 1060
I fixed this by building Livy from the Cloudera repo with the string mvn clean package -DskipTests -Dspark-2.2.0.cloudera2 -Dscala-2.10
. This version is outdated, has a broken UI, some of the Scala tests fail so they have to be skipped, and I didn't bother looking into how or why specifying 2.2.0.cloudera2
works. I also had to install Hue and its dependent services on the cluster. No other distribution of Livy, binary or source, worked.
Upvotes: 1