Reputation: 557
I would like to set spark.eventLog.enabled
and spark.eventLog.dir
at the spark-submit
or start-all
level -- not require it to be enabled in the scala/java/python code.
I have tried various things with no success:
spark-defaults.conf
asspark.eventLog.enabled true
spark.eventLog.dir hdfs://namenode:8021/directory
or
spark.eventLog.enabled true
spark.eventLog.dir file:///some/where
spark-submit
as:spark-submit --conf "spark.eventLog.enabled=true" --conf "spark.eventLog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py
SPARK_DAEMON_JAVA_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d"
and just for overkill:
SPARK_HISTORY_OPTS="-Dspark.eventLog.enabled=true -Dspark.history.fs.logDirectory=$sparkHistoryDir -Dspark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=2d"
Where and how must these things be set to get history on arbitrary jobs?
Upvotes: 20
Views: 24455
Reputation: 2165
Create a local directory:
$ mkdir /tmp/spark-events
Run Spark-shell with --conf spark.eventLog.enabled
$ spark-shell --conf spark.eventLog.enabled --class com.MainClass --packages packages_if_any --master local[4] app.jar
Upvotes: 1
Reputation: 557
I solved the problem, yet strangely I had tried this before... All the same, now it seems like a stable solution:
Create a directory in HDFS
for logging, say /eventLogging
hdfs dfs -mkdir /eventLogging
Then spark-shell
or spark-submit
(or whatever) can be run with the following options:
--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging
such as:
spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://<hdfsNameNodeAddress>:8020/eventLogging
Upvotes: 14