Sohil Jain
Sohil Jain

Reputation: 331

Advantage of creating Hive partitions when using parquet file storage

Is it any advantage to create Hive partitions when using parquet file storage. Parquet is columnar storage file formats which stores data in column chunks with all the columns stored sequentially by index. When we query select a column based on a predicate, the select column index will jump to the required range based on predicate and print the values. How will partitioning be helpful? In row-oriented hive tables, partitioning is helpful because we'll hit only specified required range of data but Im not able to understand how will it be helpful in parquet storage.

Upvotes: 2

Views: 2223

Answers (1)

Ankeeta Sawant
Ankeeta Sawant

Reputation: 1

In non-partitioned tables,hive would have to read all the files in the table's data directory and then apply filters on it.For large table it is slow and expensive. In partition tables,it will create subdirectories based on partition column.It distribute execution load horizontally and no need to search entire table columns for a single records. The parquet file format have better compression but performance is not that good. The partition with parquet reduce the execution time of query.eg.when i executed filter query on parquet table, it took 29.657 seconds whereas partition with parquet format took 14.21 seconds.If there is large table then definitely it will improve the performance of query.

Upvotes: 0

Related Questions