Reputation: 2801
We have Hadoop/Hive cluster of 2 servers, on each server Hive database uses ~160GB of disk space, but Hadoop data directory is ~850GB.
Is it normal and what is typical ratio between Hive database size and Hadoop data directory size?
Upvotes: 0
Views: 1781
Reputation: 1569
/dfs/dn refers to the datanode size, i.e , the size of the HDFS. This is inclusive of the space occupied by hive tables, and other things in hdfs.
In case you are using hadoop to only store hive data, consider creating external tables. These will only store metadata and reuse the data already stored in hdfs folders, in contrast to an internal table which will replicate the data as well as the metadata.
Upvotes: 2
Reputation: 3845
This entirely depends on the type of data you are storing. The data you are storing in Hive databases is in fact a part of hadoop data directory only. If you are only storing data for Hive tables in hadoop then the ratio would be 1:1.
There is no such relation between Hive database size and Hadoop data directory size. HDFS is a super set where all data including Hive databases is stored.
Upvotes: 2