Napas
Napas

Reputation: 2801

Typical Hive and Hadoop disk space usage

We have Hadoop/Hive cluster of 2 servers, on each server Hive database uses ~160GB of disk space, but Hadoop data directory is ~850GB.

Is it normal and what is typical ratio between Hive database size and Hadoop data directory size?

Upvotes: 0

Views: 1781

Answers (2)

Abhishek Pathak
Abhishek Pathak

Reputation: 1569

/dfs/dn refers to the datanode size, i.e , the size of the HDFS. This is inclusive of the space occupied by hive tables, and other things in hdfs.

In case you are using hadoop to only store hive data, consider creating external tables. These will only store metadata and reuse the data already stored in hdfs folders, in contrast to an internal table which will replicate the data as well as the metadata.

Upvotes: 2

Amar
Amar

Reputation: 3845

This entirely depends on the type of data you are storing. The data you are storing in Hive databases is in fact a part of hadoop data directory only. If you are only storing data for Hive tables in hadoop then the ratio would be 1:1.

There is no such relation between Hive database size and Hadoop data directory size. HDFS is a super set where all data including Hive databases is stored.

Upvotes: 2

Related Questions