Kyle Heuton
Kyle Heuton

Reputation: 9768

Spark Standalone cluster master web UI inaccessible after an application finishes

I have a spark application that finishes without error, but once it's done and saved all of its outputs and the process terminates, the Spark standalone cluster master process becomes a CPU hog, using 16 CPU's full time for hours, and the web UI becomes unresponsive. I have no idea what it could be doing, is there some complicated clean up step?

Some more details:

I've got a Spark standalone cluster (27 workers/nodes) that I've been successfully submitting jobs to for a while. I recently scaled up the size of my applications, the largest now takes 3.5 hours using 100 cores over 27 workers, and each worker has ~dozens of GB of shuffle read/write over the course of the job. Otherwise, the application is no different than the smaller jobs that have run successfully before.

Upvotes: 0

Views: 610

Answers (1)

Kyle Heuton
Kyle Heuton

Reputation: 9768

This is a known issue with Spark's standalone cluster, and is caused by the massive event log created by large applications. You can read more at the issue tracking link below.

https://issues.apache.org/jira/browse/SPARK-12299

At the current time, the best work-around is to disable event logging for large jobs.

Upvotes: 0

Related Questions