Reputation: 83
The spark job running in yarn mode, shows few tasks failed with following reason:
ExecutorLostFailure (executor 36 exited caused by one of the running tasks) Reason: Container marked as failed: container_xxxxxxxxxx_yyyy_01_000054 on host: ip-xxx-yy-zzz-zz. Exit status: -100. Diagnostics: Container released on a *lost* node
Any idea why is this happening?
Upvotes: 1
Views: 6619
Reputation: 3544
I understand your cluster is not on AWS but as AWS manager the MR cluster they have released an FAQ
For Glue job: https://aws.amazon.com/premiumsupport/knowledge-center/container-released-lost-node-100-glue/
For EMR: https://aws.amazon.com/premiumsupport/knowledge-center/emr-exit-status-100-lost-node/
Upvotes: 0
Reputation: 4156
There are two main reasons.
spark.executor.memoryOverhead
/mnt/yarn/usercache/
),df -h
to check your disk remaining space.Upvotes: 5
Reputation: 31
Container killed by the framework, either due to being released by the application or being 'lost' due to node failures etc. have a special exit code of -100. The node failure could be because of not having enough disc space or executor memory.
Upvotes: 0