SpmP
SpmP

Reputation: 557

Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start

I would like to set spark.eventLog.enabled and spark.eventLog.dir at the spark-submit or start-all level -- not require it to be enabled in the scala/java/python code. I have tried various things with no success:

Setting spark-defaults.conf as

spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://namenode:8021/directory

or

spark.eventLog.enabled           true
spark.eventLog.dir               file:///some/where

Running spark-submit as:

spark-submit --conf "spark.eventLog.enabled=true" --conf "spark.eventLog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py

Starting spark with environment variables:

SPARK_DAEMON_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d"

and just for overkill:

SPARK_HISTORY_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d"

Where and how must these things be set to get history on arbitrary jobs?

Upvotes: 20

Views: 24455

Answers (2)

Sandeep
Sandeep

Reputation: 2165

Create a local directory:

$ mkdir /tmp/spark-events

Run Spark-shell with --conf spark.eventLog.enabled

$ spark-shell --conf spark.eventLog.enabled --class com.MainClass --packages packages_if_any --master local[4] app.jar

Upvotes: 1

SpmP
SpmP

Reputation: 557

I solved the problem, yet strangely I had tried this before... All the same, now it seems like a stable solution:

Create a directory in HDFS for logging, say /eventLogging

hdfs dfs -mkdir /eventLogging

Then spark-shell or spark-submit (or whatever) can be run with the following options:

--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging

such as:

spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging

Upvotes: 14

Related Questions