Reputation: 19854
I'm getting the following error when attempting to write to HDFS as part of my multi-threaded application
could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
I've tried the top-rated answer here around reformatting but this doesn't work for me: HDFS error: could only be replicated to 0 nodes, instead of 1
What is happening is this:
PartitionTextFileWriter
Thread 1 and 2 will not be writing to the same file, although they do share a parent directory at the root of my directory tree.
There are no problems with disk space on my server.
I also see this in my name-node logs, but not sure what it means:
2016-03-15 11:23:12,149 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2016-03-15 11:23:12,150 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=1, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2016-03-15 11:23:12,150 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2016-03-15 11:23:12,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.104.247.78:52004 Call#61 Retry#0
java.io.IOException: File /metrics/abc/myfile could only be replicated to 0 nodes instead of [2016-03-15 13:34:16,663] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.GroupMetadataManager)
What could be the cause of this error?
Thanks
Upvotes: 36
Views: 53062
Reputation: 162
1.check your firewall status, you can simply stop firewall in both master and slaves:systemctl stop firewalld
. Which fixed my problem.
2.delete namenode
and reformat it: delete namenode
dir and datanode
dir both.(as my slaves computer didn't shutdown normally, causing my datanode
broken) then call
hdfs namenode -format`.
jps
in both master and slaves. make sure master have namenode
and slaves have datanode
.Upvotes: 0
Reputation: 1
maybe the number of your DataNode is too small(less than 3), I put 3 ip-address in hadoop/etc/hadoop/slaves
, and it works!
Upvotes: 0
Reputation: 11
Got this error as Data Node was not running. To resolve this on VM
Upvotes: 0
Reputation: 83
Another reason could be that your Datanode machine hasn't exposed the port(50010 by default). In my case, I was trying to write a file from Machine1 to HDFS running on a Docker container C1 which was hosted on Machine2. For the host machine to forward the requests to the services running on the container, the port forwarding should be taken care of. I could resolve the issue after forwarding the port 50010 from host machine to guest machine.
Upvotes: 5
Reputation: 1213
In my case the problem was hadoop temporary files
Logs were showing the following error:
2019-02-27 13:52:01,079 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /tmp/hadoop-i843484/dfs/data/in_use.lock acquired by nodename 28111@slel00681841a
2019-02-27 13:52:01,087 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-i843484/dfs/data: namenode clusterID = CID-38b0104b-d3d2-4088-9a54-44b71b452006; datanode clusterID = CID-8e121bbb-5a08-4085-9817-b2040cd399e1
I solved by removing hadoop tmp files
sudo rm -r /tmp/hadoop-*
Upvotes: 1
Reputation: 23
I too had the same error, then i have changed the block size. This came to resolve the problem.
Upvotes: 1
Reputation: 22651
You may leave HDFS safe mode:
hdfs dfsadmin -safemode forceExit
Upvotes: 2
Reputation: 31
In my case it was a storage policy of output path set to COLD.
How to check settings of your folder:
hdfs storagepolicies -getStoragePolicy -path my_path
In my case it returned
The storage policy of my_path
BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
I dumped the data else where (to HOT storage) and the issue went away.
Upvotes: 2
Reputation: 5888
I had the same error, re-starting hdfs services solved this issue. ie re-started NameNode and DataNode services.
Upvotes: 3
Reputation: 12321
Check if the jps
command on the computers which run the datanodes show that the datanodes are running. If they are running, then it means that they could not connect with the namenode and hence the namenode thinks there are no datanodes in the hadoop system.
In such a case, after running start-dfs.sh
, run netstat -ntlp
in the master node. 9000 is the port number most tutorials tells you to specify in core-site.xml
. So if you see a line like this in the output of netstat
tcp 0 0 120.0.1.1:9000 0.0.0.0:* LISTEN 4209/java
then you have a problem with the host alias. I had the same problem, so I'll state how it was resolved.
This is the contents of my core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://vm-sm:9000</value>
</property>
</configuration>
So the vm-sm
alias in the master computer maps to the 127.0.1.1. This is because of the setup of my /etc/hosts
file.
127.0.0.1 localhost
127.0.1.1 vm-sm
192.168.1.1 vm-sm
192.168.1.2 vm-sw1
192.168.1.3 vm-sw2
Looks like the core-site.xml
of the master system seemed to have mapped on the the 120.0.1.1:9000
while that of the worker nodes are trying to connect through 192.168.1.1:9000
.
So I had to change the alias of the master node for the hadoop system (just removed the hyphen) in the /etc/hosts
file
127.0.0.1 localhost
127.0.1.1 vm-sm
192.168.1.1 vmsm
192.168.1.2 vm-sw1
192.168.1.3 vm-sw2
and reflected the change in the core-site.xml
, mapred-site.xml
, and slave
files (wherever the old alias of the master occurred).
After deleting the old hdfs files from the hadoop location as well as the tmp
folder and restarting all nodes, the issue was solved.
Now, netstat -ntlp
after starting DFS returns
tcp 0 0 192.168.1.1:9000 0.0.0.0:* LISTEN ...
...
Upvotes: 3
Reputation: 1721
I had a similar issue recently. As my datanodes (only) had SSDs for storage, I put [SSD]file:///path/to/data/dir
for the dfs.datanode.data.dir
configuration. Due to the logs containing unavailableStorages=[DISK]
I removed the [SSD]
tag, which solved the problem.
Apparently, Hadoop uses [DISK]
as default Storage Type, and does not 'fallback' (or rather 'fallup') to using SSD if no [DISK]
tagged storage location is available. I could not find any documenation on this behaviour though.
Upvotes: 1
Reputation: 7960
This error is caused by the block replication system of HDFS since it could not manage to make any copies of a specific block within the focused file. Common reasons of that:
Also please:
Ref: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Also, please check: Writing to HDFS from Java, getting "could only be replicated to 0 nodes instead of minReplication"
Upvotes: 37