atxdba
atxdba

Reputation: 5216

Is there a way to gather stats after spark-submit of resources used?

I'm working with spark and Yarn as my resource manager. I'm trying top find a way to gather resources allocated for the job after a run. The resource manager only reports current usage so after it's complete it's zeroed out.

If I can't get them after the fact is there a way to have the Spark Job accumulate stats as it outs to output/store at the end?

Upvotes: 0

Views: 262

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210812

Try to use Spark History Server:

Viewing After the Fact

It is still possible to construct the UI of an application through Spark’s history server, provided that the application’s event logs exist. You can start the history server by executing:

./sbin/start-history-server.sh

This creates a web interface at http://<server-url>:18080 by default, listing incomplete and completed applications and attempts.

When using the file-system provider class (see spark.history.provider below), the base logging directory must be supplied in the spark.history.fs.logDirectory configuration option, and should contain sub-directories that each represents an application’s event logs.

The spark jobs themselves must be configured to log events, and to log them to the same shared, writable directory. For example, if the server was configured with a log directory of hdfs://namenode/shared/spark-logs, then the client-side options would be:

spark.eventLog.enabled true
spark.eventLog.dir hdfs://namenode/shared/spark-logs

Upvotes: 1

Related Questions