lisak
lisak

Reputation: 21981

Spark job hanging for 12 minutes after it finishes

I'm executing a spark job on a single executor in a standalone mode that runs as expected but it gets always stuck at the end for 12 minutes. Both the executor and driver docker containers are basically idling for that period of time. Here you can see the job and the logs :

https://gist.github.com/l15k4/25588d35a6c786b4ade514739c0195ee

I'm printing out the Stats which happens 12 minutes after the job finishes based on what I can see in WebUI ... Any idea what might be the cause?

It takes 20 minutes instead of 8...

After those initial 8 minutes, nload shows minimal upload/download traffic and top shows a minimal 3-5% CPU load in the driver container. Apart from that, everything else idles...

Upvotes: 1

Views: 1094

Answers (1)

lisak
lisak

Reputation: 21981

Turns out that those extra 12 minutes is spent on merging hadoop _temporary files on s3 to the target files. It is very inefficient as both download and upload don't perform better than 3 MBits/s even though downloading input files from s3 performs 70 Mbit/s.

In other wordsrdd.saveAsHadoopFile("s3a:// ...) is extremely inefficient and should be used really only for low volumes of data ...

Upvotes: 1

Related Questions