Nandita Dwivedi
Nandita Dwivedi

Reputation: 83

spark tasks fail with error, showing exit status: -100

The spark job running in yarn mode, shows few tasks failed with following reason:

ExecutorLostFailure (executor 36 exited caused by one of the running tasks) Reason: Container marked as failed: container_xxxxxxxxxx_yyyy_01_000054 on host: ip-xxx-yy-zzz-zz. Exit status: -100. Diagnostics: Container released on a *lost* node

Any idea why is this happening?

Upvotes: 1

Views: 6619

Answers (3)

Sachin
Sachin

Reputation: 3544

I understand your cluster is not on AWS but as AWS manager the MR cluster they have released an FAQ

For Glue job: https://aws.amazon.com/premiumsupport/knowledge-center/container-released-lost-node-100-glue/

For EMR: https://aws.amazon.com/premiumsupport/knowledge-center/emr-exit-status-100-lost-node/

Upvotes: 0

DennisLi
DennisLi

Reputation: 4156

There are two main reasons.

  1. It is may because of your memoryOverhead needed by the yarn container is not enough, and the solution is to Increase the spark.executor.memoryOverhead
  2. Possibly, it is because the slave node disk lack space to write tmp data required by spark. check your yarn usercache dir (for EMR, it locates on /mnt/yarn/usercache/),
    or type df -h to check your disk remaining space.

Upvotes: 5

BigDataGuru
BigDataGuru

Reputation: 31

Container killed by the framework, either due to being released by the application or being 'lost' due to node failures etc. have a special exit code of -100. The node failure could be because of not having enough disc space or executor memory.

Upvotes: 0

Related Questions