Reputation: 63
We have a small Hbase cluster on EC2 with 6 region servers. Lately we found that the data in one of the column families is really not that useful for us and decided to chuck it. This particular column family takes more than 50 percent of space on disk. We altered the table,removes the column family and ran major compaction. We also ran major compaction on the '-ROOT-' and the '.META.' tables. But there is still no reduction in total DFS file size? Are we missing something here. Any help/pointers would be greatly appreciated.
regards.
Upvotes: 4
Views: 1775
Reputation: 6361
Just to add another thing to check - in Hbase 0.90.4 at least, dropping a table removes the files from HDFS but the contents of the .logs directory are not necessarily.
For example, run hadoop fs -du /yourHbaseDirInDFS
and you will see the .logs directory with a chunk of data in it still. This does not seem to go away until the HBase cluster is restarted. Alternately I guess you could delete the log files manually, but it seems better to me to let hbase do it.
Upvotes: 2
Reputation: 63
Got it! It was a bug in Hbase. They are not deleting the filer from the HDFS. We had to find and delete the files from the hadoop-files system.
Upvotes: 1