HDFS Datanode crashes with OutOfMemoryError

Question

I´m having repeated crashes in my Cloudera cluster HDFS Datanodes due to an OutOfMemoryError:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/hdfs_hdfs-DATANODE-e26e098f77ad7085a5dbf0d369107220_pid18551.hprof ...
Heap dump file created [2487730300 bytes in 16.574 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh"
#   Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"...
18551 TS   19 ?        00:25:37 java
Wed Aug  7 11:44:54 UTC 2019
JAVA_HOME=/usr/lib/jvm/java-openjdk
using /usr/lib/jvm/java-openjdk as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/3087-hdfs-DATANODE as CONF_DIR
using  as SECURE_USER
using  as SECURE_GROUP
CONF_DIR=/run/cloudera-scm-agent/process/3087-hdfs-DATANODE
CMF_CONF_DIR=/etc/cloudera-scm-agent
4194304

When analyzing the heap dump, the apparent biggest suspects are millions of instances of ScanInfo apparently quequed in the ExecutorService of the class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.

When I inspect the content of each ScanInfo runnable object, I don´t see anything weird:

Apart from this and a bit high block count in HDFS, I don´t get any other information apart from the different DataNodes crashing randomly in my cluster.

Any idea why these objects keep queueing up in the DirectoryScanner thread pool?

HDFS Datanode crashes with OutOfMemoryError

Answers (1)

Related Questions