sharath
sharath

Reputation: 21

HDFS under-replicated blocks to file mapping

The HDFS filesystem shows that around 600K blocks on a cluster are under-replicated due to rack failure. Is there a way to know which files will be affected if these blocks are lost, before HDFS recovers? I can't do a 'fsck /' as the cluster is very large.

Upvotes: 2

Views: 387

Answers (2)

Abhinav
Abhinav

Reputation: 676

There is a better solution to your problem.

Just run

hdfs dfsadmin -metasave <filename>

and all the metadata with under-replicated blocks file path and every other information will be stored to a file and you can directly view that file.

It seems to be a better option to me

Upvotes: 0

Lakshman Battini
Lakshman Battini

Reputation: 1912

Namenode UI lists the Missing blocks and JMX logs lists the corrupted/missing blocks. UI and JMX just shows the number of under-replicated blocks.

There are two way to view the under-replicated blocks/files: using fsck or WebHDFS API.

using WebHDFS REST API:

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"

This will return the response with a FileStatuses JSON object. Parse the JSON object and filter for the files having the replication less than the configured value.

Please find below the sample Response returned from NN:

curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<PATH_OF_DIRECTORY>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)

{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}

If the number of files are more, you can also list the files iteratively using ?op=LISTSTATUS_BATCH&startAfter=<CHILD>

Reference: https://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Iteratively_List_a_Directory

Upvotes: 2

Related Questions