Knight71
Knight71

Reputation: 2959

Spark worker dies after running for some duration

I am running spark streaming job.

My cluster config

Spark version - 1.6.1
spark node  config
cores - 4
memory - 6.8 G (out of 8G)
number of nodes - 3

For my job I am giving 6GB memory per node and total cores - 3

After the job has been running for an hour , I am getting the following error on worker log

    Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f53b496a000, 262144, 0) failed; error='Cannot allocate memory' (errno=12)
    #
    # There is insufficient memory for the Java Runtime Environment to continue.
    # Native memory allocation (mmap) failed to map 262144 bytes for committing reserved memory.
    # An error report file with more information is saved as:
    # /usr/local/spark/sbin/hs_err_pid1622.log

Whereas I don't see any errors in my work-dir/app-id/stderr .

What is the xm* settings that is usually recommended for running spark worker ?

How to debug this issue further ?

PS: I started my worker and master with the default settings.

Update:

I see my executors are getting added and removed frequently because of the error "cannot allocate memory".

log:

  16/06/24 12:53:47 INFO MemoryStore: Block broadcast_53 stored as values in memory (estimated size 14.3 KB, free 440.8 MB)
  16/06/24 12:53:47 INFO BlockManager: Found block rdd_145_1 locally
  16/06/24 12:53:47 INFO BlockManager: Found block rdd_145_0 locally
  Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f3440743000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)

Upvotes: 4

Views: 4879

Answers (1)

Fang
Fang

Reputation: 136

I have got the same situation.I find the reason in the official Document ,it said that:

In general, Spark can run well with anywhere from 8 GB to hundreds of gigabytes of memory per machine. In all cases, we recommend allocating only at most 75% of the memory for Spark; leave the rest for the operating system and buffer cache.

Your compute memory have 8GB and 6GB is for worker node.So,if the operating system used memory exceeding 2GB ,leave not enough memory for worker node,the worker will loss. *just check how much memory the operating system will use,and allocate the rest memory for the worker node *

Upvotes: 1

Related Questions