Reputation: 107
I wanna know how hive partitioning works I know the concept but I am trying to understand how its working and store the in exact partition. Let say I have a table and I have created partition on year its dynamic, ingested data from 2013 so how hive create partition and store the exact data in exact partition.
Upvotes: 0
Views: 1640
Reputation: 119
If the table is not partitioned, all the data is stored in one directory without order. If the table is partitioned(eg. by year) data are stored separately in different directories. Each directory is corresponding to one year. For a non-partitioned table, when you want to fetch the data of year=2010, hive have to scan the whole table to find out the 2010-records. If the table is partitioned, hive just go to the year=2010 directory. More faster and IO efficient
Upvotes: 1
Reputation: 8658
Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date.
Partitions
- apart from being storage units - also allow the user to efficiently identify the rows that satisfy a certain criteria.
Using partition, it is easy to query a portion of the data.
Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.
Suppose you need to retrieve the details of all employees who joined in 2012. A query searches the whole table for the required information. However, if you partition the employee data with the year and store it in a separate file, it reduces the query processing time.
Upvotes: 0