HbnKing
HbnKing

Reputation: 1882

datanode failed with org.apache.hadoop.util.DiskChecker$DiskErrorException

Recently, I installed Hadoop and formatted namenode . the namenode started well but the datanodes started failed . here is the datanode error log

  STARTUP_MSG:   build = [email protected]:hortonworks/hadoop.git -r 3091053c59a62c82d82c9f778c48bde5ef0a89a1; compiled by 'jenkins' on 2018-05-11T07:53Z
STARTUP_MSG:   java = 1.8.0_181
************************************************************/
2018-10-17 15:08:42,769 INFO  datanode.DataNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2018-10-17 15:08:43,665 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(122)) - Scheduling a check for [DISK]file:/hadoop/hdfs/data/
2018-10-17 15:08:43,682 ERROR datanode.DataNode (DataNode.java:secureMain(2692)) - Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value configured for dfs.datanode.failed.volumes.tolerated - 1. Value configured is >= to the number of configured volumes (1).
    at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:174)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2584)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2493)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2540)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2685)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2709)
2018-10-17 15:08:43,688 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2018-10-17 15:08:43,696 INFO  datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hdp2.com/192.168.100.12

what does dfs.datanode.failed.volumes.tolerated - 1 means ? What caused such a mistake?

Upvotes: 1

Views: 8767

Answers (3)

BetterCallMe
BetterCallMe

Reputation: 768

Remove the following property form hdfs-site.xml file.

<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///C:/hadoop-3.3.0/data/datanode</value>
 </property>

Upvotes: 0

HbnKing
HbnKing

Reputation: 1882

when I try to solved this problem , I searched the sourceCode

 final int volFailuresTolerated =
  conf.getInt(DFSConfigKeys.DFS_DATANODE_FAILED_VOLUMES_TOLERATED_KEY,
              DFSConfigKeys.DFS_DATANODE_FAILED_VOLUMES_TOLERATED_DEFAULT);

String[] dataDirs = conf.getTrimmedStrings(DFSConfigKeys.DFS_DATANODE_DATA_DIR_KEY);

int volsConfigured = (dataDirs == null) ? 0 : dataDirs.length;
int volsFailed = volsConfigured - storage.getNumStorageDirs();
this.validVolsRequired = volsConfigured - volFailuresTolerated;

if (volFailuresTolerated < 0 || volFailuresTolerated >= volsConfigured) {
  throw new DiskErrorException("Invalid volume failure "
      + " config value: " + volFailuresTolerated);
}
if (volsFailed > volFailuresTolerated) {
  throw new DiskErrorException("Too many failed volumes - "
      + "current valid volumes: " + storage.getNumStorageDirs() 
      + ", volumes configured: " + volsConfigured 
      + ", volumes failed: " + volsFailed
      + ", volume failures tolerated: " + volFailuresTolerated);
}

AS you see

The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.

  That is the number of disk damage that datanode can tolerate.

In a Hadoop cluster, disk read-only or corruption often occurs. The datanode will use the folder configured under dfs.datanode.data.dir (used to store the block) at startup. If there are some values that cannot be used and the number configured above, the DataNode will fail to start.

In my hadoop environment, fs.datanode.data.dir is configured as 1 disks, so dfs.datanode.failed.volumes.tolerated is set to 1, which allows a disk to be bad. There is only one disk under the line, and the values of volFailuresTolerated and volsConfigured are both 1, so it will cause the code to fail.

Upvotes: 1

Rahim Dastar
Rahim Dastar

Reputation: 1269

Check hdfs-site.xml. This property must be set to 0 or higher:

dfs.datanode.failed.volumes.tolerated

The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.

Upvotes: 0

Related Questions