ankitbeohar90
ankitbeohar90

Reputation: 107

How Hive Partition works

I wanna know how hive partitioning works I know the concept but I am trying to understand how its working and store the in exact partition. Let say I have a table and I have created partition on year its dynamic, ingested data from 2013 so how hive create partition and store the exact data in exact partition.

Upvotes: 0

Views: 1640

Answers (2)

Travis
Travis

Reputation: 119

If the table is not partitioned, all the data is stored in one directory without order. If the table is partitioned(eg. by year) data are stored separately in different directories. Each directory is corresponding to one year. For a non-partitioned table, when you want to fetch the data of year=2010, hive have to scan the whole table to find out the 2010-records. If the table is partitioned, hive just go to the year=2010 directory. More faster and IO efficient

Upvotes: 1

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date.

Partitions - apart from being storage units - also allow the user to efficiently identify the rows that satisfy a certain criteria.

Using partition, it is easy to query a portion of the data.

Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.

Suppose you need to retrieve the details of all employees who joined in 2012. A query searches the whole table for the required information. However, if you partition the employee data with the year and store it in a separate file, it reduces the query processing time.

Upvotes: 0

Related Questions