Reputation: 5216
I'm working with spark and Yarn as my resource manager. I'm trying top find a way to gather resources allocated for the job after a run. The resource manager only reports current usage so after it's complete it's zeroed out.
If I can't get them after the fact is there a way to have the Spark Job accumulate stats as it outs to output/store at the end?
Upvotes: 0
Views: 262
Reputation: 210812
Try to use Spark History Server:
Viewing After the Fact
It is still possible to construct the UI of an application through Spark’s history server, provided that the application’s event logs exist. You can start the history server by executing:
./sbin/start-history-server.sh
This creates a web interface at http://<server-url>:18080
by default, listing incomplete and completed applications and attempts.
When using the file-system provider class (see spark.history.provider below), the base logging directory must be supplied in the spark.history.fs.logDirectory configuration option, and should contain sub-directories that each represents an application’s event logs.
The spark jobs themselves must be configured to log events, and to log them to the same shared, writable directory. For example, if the server was configured with a log directory of hdfs://namenode/shared/spark-logs
, then the client-side options would be:
spark.eventLog.enabled true
spark.eventLog.dir hdfs://namenode/shared/spark-logs
Upvotes: 1