Reputation: 22479
I am doing a POC with hadoop 2.9.0 as a distributed file storage system and thus have setup a multi node cluster setup with 1 namenode and 4 datanodes (including master) with replication factor of 2.
Now, after a series of copy operations I decided to stop one of the datanode (slave2). And then I cleaned up few GB of data using hdfs dfs -rm -skipTrash
command while slave2 was still down.
Later I restarted the slave2 datanode that I has stopped and it seems that it didn't clean up data blocks that were deleted from hdfs during its downtime.
I continued with adding/deleting more data to see if it might sync up with master namenode and perform the local cleanup to reclaim the diskspace but it didn't.
Below are data consumption on each nodes:
slave2:
hduser@slave2:~$ hdfs dfs -du -s -h /
4.5 G /
hduser@slave2:~$ du -sh /hadoop-tmp/
7.7G /hadoop-tmp/ [<-- notice extra 2.2 GB of data present on local disk]
master:
hduser@master:~$ du -sh /hadoop-tmp/
4.6G /hadoop-tmp/
hduser@master:~$ hdfs dfs -du -s -h /
4.5 G /
slave1:
hduser@slave1:~$ hdfs dfs -du -s -h /
4.5 G /
hduser@slave1:~$ du -sh /hadoop-tmp/
4.5G /hadoop-tmp/
slave3:
hduser@slave3:/$ du -sh /hadoop-tmp/
2.8G /hadoop-tmp/
hduser@slave3:/$ hdfs dfs -du -s -h /
4.5 G /
I guess my question here is "how much time will slave2 datenode take to sync up with master namenode to acknowledge that it has locally stored data blocks that have been deleted from HDFS cluster and thus it needs to clean it up. And if that does happens overtime then can we control that time duration of sync up"?
And if that is not going to happen then what is the process of reclaiming disk-space from datanodes that went down and came back up after a while?
Upvotes: 0
Views: 962
Reputation: 11
You might consider running a FSCK to identify the inconsistent blocks on your cluster and then take the necessary action to delete the blocks left behind if the data is no longer intended to be retained.
Upvotes: 1