Guy Kahlon
Guy Kahlon

Reputation: 4520

Hive file formats advantages and disadvantages

I start to work with Hive. I wanted to know what queries should to use for each table format among formats: rcfile, orcfile, parquet, delimited text

Upvotes: 1

Views: 5347

Answers (3)

Rahul
Rahul

Reputation: 2384

I see that there are a couple of answers but since your question didn't asked for any particular file formats, the answers addressed one or the other file format.

There are a bunch of file formats that you can use in Hive. Notable mentions are AVRO, Parquet. RCFile & ORC. There are some good documents available online that you may refer to if you want to compare the performance and space utilization of these file formats. Follows some useful links that will get you going.

This Blog Post

This link from MapR [They don't discuss Parquet though]

This link from Inquidia

The above given links will get you going. I hope this answer your query.

Thanks!

Upvotes: 1

Alen
Alen

Reputation: 174

For ORC file format , have a look at the hive documentation which has a detailed description here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC

Parquet file format stores data in column form. eg: Col1 Col2 A 1 B 2 C 3

Normal data is stored as A1B2C3. Using Parquet, data is stored as ABC123. For parquet file format , have a read on https://blog.twitter.com/2013/dremel-made-simple-with-parquet

Upvotes: 1

kris433
kris433

Reputation: 414

when you have tables with very large number of columns and you tend to use specific columns frequently, RC file format would be a good choice. Rather than reading the entire row of data you would just retrieve the required columns, thus saving time. The data is divided into groups of rows, which are then divided into groups of columns.

Delimited text file is the general file format.

Upvotes: 1

Related Questions