Victor
Victor

Reputation: 2546

HDFS Datanode crashes with OutOfMemoryError

I´m having repeated crashes in my Cloudera cluster HDFS Datanodes due to an OutOfMemoryError:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/hdfs_hdfs-DATANODE-e26e098f77ad7085a5dbf0d369107220_pid18551.hprof ...
Heap dump file created [2487730300 bytes in 16.574 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh"
#   Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"...
18551 TS   19 ?        00:25:37 java
Wed Aug  7 11:44:54 UTC 2019
JAVA_HOME=/usr/lib/jvm/java-openjdk
using /usr/lib/jvm/java-openjdk as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/3087-hdfs-DATANODE as CONF_DIR
using  as SECURE_USER
using  as SECURE_GROUP
CONF_DIR=/run/cloudera-scm-agent/process/3087-hdfs-DATANODE
CMF_CONF_DIR=/etc/cloudera-scm-agent
4194304

When analyzing the heap dump, the apparent biggest suspects are millions of instances of ScanInfo apparently quequed in the ExecutorService of the class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.

Eclipse MAT tool showing the dominator tree

When I inspect the content of each ScanInfo runnable object, I don´t see anything weird:

ScanInfo instance content

Apart from this and a bit high block count in HDFS, I don´t get any other information apart from the different DataNodes crashing randomly in my cluster.

Any idea why these objects keep queueing up in the DirectoryScanner thread pool?

Upvotes: 0

Views: 1149

Answers (1)

Avinash Tripathy
Avinash Tripathy

Reputation: 601

You can try once below command.

$ hadoop dfsadmin -finalizeUpgrade The -finalizeUpgrade command removes the previous version of the NameNode’s and DataNodes’ storage directories.

Upvotes: 1

Related Questions