Reputation: 10270
We have perhaps not unsurprisingly given how fascinating big data is to the business, a disk space issue we'd like to monitor on our hadoop clusters.
I have a cron job running and it is doing just what I want except that I'd like one of the output lines to show the overall space used. In other words, in bash, the very last line of a "du /" command shows the total usage for all the subfolders on the entire disk. I'd like that behavior.
Currently when I run "hadoop dfs -du /", however, I get only the subdirectory info and not the overall total.
What's the best way to get this? thank you so much to all you Super Stack Overflow people :).
Upvotes: 1
Views: 1591
Reputation: 1810
hadoop fs -du -s -h /path
This will give you the summary.
For the whole cluster you can try :
hdfs dfsadmin -report
You may need to run this with HDFS user.
Upvotes: 0
Reputation: 10270
I just didn't understand the docs correctly! Here is the answer to get the total space used;
$ hadoop dfs -dus /
hdfs://MYSERVER.com:MYPORT/ 999
$ array=(`hadoop dfs -dus /`)
$ echo $array
hdfs://MYURL:MYPORT/
$ echo ${array[1]} ${array[0]}
999 hdfs://MYURL:MYPORT/
Reference; File System Shell Guide http://hadoop.apache.org/docs/r1.2.1/file_system_shell.html#du //edit; Also corrected the order of reporting to match the original.
Upvotes: 1