Preventing SparkListenerBus errors

Question

I am running a spark-submit application on an AWS EMR cluster (EMR 5.0.0, Spark 2.0.0, 30 r3.4xlarge). To launch the script, I SSH into the master node, then run the following command:

time spark-submit --conf spark.sql.shuffle.partitions=5000 \
--conf spark.memory.storageFraction=0.3 --conf spark.memory.fraction=0.95 \
--executor-memory 8G --driver-memory 10G dataframe_script.py

The application uses the default AWS spark configuration, which has spark.master=yarn, and deploy-mode=client.

The application loads ~220GB of data, does SQL-like aggregations, then writes to s3. The written data looks like it was processed correctly. While the code is running, I see an error messaage, but the code continues to run:

ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.

After the application is done writing, the application does not return to the command line for >10 minutes, sending out a a warning:

WARN ExecutorAllocationManager: No stages are running, but numRunningTasks != 0

then tens of thousands of lines with the error message:

16/10/12 00:40:03 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(176,WrappedArray())

The progress bar also continues to move in between the error messages, e.g.:

[Stage 17:=================================================>   (465 + 35) / 500]

My code for the write and end of the main step:

def main():
    # some processing
    df.select(selection_list).write.json('s3path', compression=codec)
    print 'Done saving, shutting down'
    sc.stop()

There is a previous StackOverflow question, which refers to this JIRA. It looks like there was a fix for older versions of Spark, but I don't quite understand what the problem was.

How do I avoid these error messages?

Preventing SparkListenerBus errors

Answers (1)

Related Questions