Stefan R.
Stefan R.

Reputation: 112

Spark batches does not complete when running on Yarn cluster

Setting the scene
I am working to make a Spark streaming application (Spark 2.2.1 with Scala) run on a Yarn cluster (Hadoop 2.7.4).

So far I managed to submit the application to the Yarn cluster with spark-submit. I can see that the receiver task starts up correctly and fetches a lot of records from the database (Couchbase Server 5.0) and I can also see that the records are divided into batches.

The question
When I look at the Streaming Statistics on the Spark Web UI, I can however see that my batches are never processed. I have seen batches with 0 records process and complete but when a batch with records start processing it never completes. One time it even got stuck on a batch with 0 records.

I even tried simplifying the output operations on the SteamingContext as much as possible. But still with the very simple output operation print() my batches are never processed. The logs does not show any warnings or errors.

Does anyone know what might be wrong? Any suggestions on how to solve this will be much appreciated.

More Info
The main class of the Spark application is built from this example (first one) from the Couchbase Spark Connector documentation combined with this example with checkpoint from the Spark Documentation.

Right now I have 3230 Active Batches (3229 queued and 1 processing) and 1 Completed Batch (that had 0 records) and the application has been running for 4 hours and 30 minutes... and another batch is added every 5 seconds.

If I look at the "thread dump" for the executors I see a lot of WAITING, TIMED WAITING and a few RUNNABLE threads. The list will fill up 3 screenshots, so i will only post it if needed.

Below you will find some screenshots from the Web UI

Executor Overview

Spark Jobs Overview

Node Overview with resources

Capacity Scheduler Overview

Upvotes: 1

Views: 1615

Answers (1)

Vinoth Chinnasamy
Vinoth Chinnasamy

Reputation: 470

Per screenshot, you have 2 cores and 1 is being used for driver and another is being used for receiver. You don't have a core for the actual processing to happen. Please increase the number of cores and try again.

Refer: https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers

If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark Properties for information on how to set the master).

Upvotes: 1

Related Questions