Gayatri
Gayatri

Reputation: 2253

Zeppelin with Spark2

I am trying to configure Zeppelin to work with Spark2 and cloudera version 5.10.1 in cloudera Screenshot of the spark interpreter

I get the error "org.apache.zeppelin.interpreter.InterpreterException:opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/bin/spark2-submit/bin/spark-submit: Not a directory"

Clearly it appends "/bin/spark-submit" to the path. How do I correct this?

Upvotes: 1

Views: 1881

Answers (2)

piotrektt
piotrektt

Reputation: 189

Setting SPARK_HOME to "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/" may not be enough. In my case spark2 started working in zeppelin when I set the SPARK_HOME to:

SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2

*SPARK2 is link to that longer name of parcel.

To further develop my answer. Solution of @molotow gave me error like this in zeppelin:

org.apache.zeppelin.interpreter.InterpreterException: /opt/cloudera/parcels/SPARK2/bin/spark-submit: line 17: //../../CDH/lib/bigtop-utils/bigtop-detect-javahome: No such file or directory
/opt/cloudera/parcels/SPARK2/bin/spark-submit: line 19: //../lib/spark2/bin/spark-submit: No such file or directory

Which may be related to how 'spark2-submit' tries to locate paths it needs to work. Mainly:

#!/bin/bash
  # Reference: http://stackoverflow.com/questions/59895/can-a-bash-script-tell-what-directory-its-stored-in
  SOURCE="${BASH_SOURCE[0]}"
  BIN_DIR="$( dirname "$SOURCE" )"
  while [ -h "$SOURCE" ]
  do
    SOURCE="$(readlink "$SOURCE")"
    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
    BIN_DIR="$( cd -P "$( dirname "$SOURCE"  )" && pwd )"
  done
  BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
  CDH_LIB_DIR=$BIN_DIR/../../CDH/lib
  LIB_DIR=$BIN_DIR/../lib
export HADOOP_HOME=$CDH_LIB_DIR/hadoop

Hope that helps someone. :)

Upvotes: 3

tardis
tardis

Reputation: 1370

You should set your variable SPARK_HOME (in the file conf/zeppelin-env.sh of your zeppelin installation) to the base directory of your spark installation that is "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/". If the (additional) problem is the name of "spark2-submit" vs. "spark-submit" then I would create a symlink on the shell with

cd /opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/bin/
ln -s spark2-submit spark-submit

Upvotes: 3

Related Questions