Julián Gómez
Julián Gómez

Reputation: 351

Apache zeppelin: Spark cluster configuration

I'm a new user of pyspark from Apache Zeppelin 0.7.1 to access my Spark cluster. I configured 2 machines:

Situation:

Following this zeppelin documentation, I put spark://Machine-1:7077 at the master property of the spark interpreter configuration. Then, some code runs OK from the cells of my Zeppelin Notebook:

%spark
sc.version
sc.getConf.get("spark.home")
System.getenv().get("PYTHONPATH")
System.getenv().get("SPARK_HOME")

but others RDD trasnformations (for instance) never end:

%pyspark
input_file = "/tmp/kddcup.data_10_percent.gz"
raw_rdd = sc.textFile(input_file)

What's wrong? Some advice? Thank you in adance.

Upvotes: 0

Views: 804

Answers (1)

Julián Gómez
Julián Gómez

Reputation: 351

eventually I realised that:

  1. Memory and cores parameters for workers are not suitable for my cluster. I changed the values in spark-env.sh files and It's working!.
  2. Configuration parameters in Apache Zeppelin had also some mistake (son extra spark modules needed)

Thank you, Greg, for your interest.

Upvotes: 0

Related Questions