AnneTheAgile
AnneTheAgile

Reputation: 10270

How to see entire root hdfs disk usage? (hadoop dfs -du / gets subfolders)

We have perhaps not unsurprisingly given how fascinating big data is to the business, a disk space issue we'd like to monitor on our hadoop clusters.

I have a cron job running and it is doing just what I want except that I'd like one of the output lines to show the overall space used. In other words, in bash, the very last line of a "du /" command shows the total usage for all the subfolders on the entire disk. I'd like that behavior.

Currently when I run "hadoop dfs -du /", however, I get only the subdirectory info and not the overall total.

What's the best way to get this? thank you so much to all you Super Stack Overflow people :).

Upvotes: 1

Views: 1591

Answers (2)

Venkat
Venkat

Reputation: 1810

hadoop fs -du -s -h /path

This will give you the summary.

For the whole cluster you can try :

hdfs dfsadmin -report

You may need to run this with HDFS user.

Upvotes: 0

AnneTheAgile
AnneTheAgile

Reputation: 10270

I just didn't understand the docs correctly! Here is the answer to get the total space used;

$ hadoop dfs -dus /
hdfs://MYSERVER.com:MYPORT/ 999
$ array=(`hadoop dfs -dus /`)
$ echo $array
hdfs://MYURL:MYPORT/
$ echo ${array[1]} ${array[0]}
999 hdfs://MYURL:MYPORT/

Reference; File System Shell Guide http://hadoop.apache.org/docs/r1.2.1/file_system_shell.html#du //edit; Also corrected the order of reporting to match the original.

Upvotes: 1

Related Questions