Reputation: 4520
I start to work with Hive. I wanted to know what queries should to use for each table format among formats: rcfile, orcfile, parquet, delimited text
Upvotes: 1
Views: 5347
Reputation: 2384
I see that there are a couple of answers but since your question didn't asked for any particular file formats, the answers addressed one or the other file format.
There are a bunch of file formats that you can use in Hive. Notable mentions are AVRO, Parquet. RCFile & ORC. There are some good documents available online that you may refer to if you want to compare the performance and space utilization of these file formats. Follows some useful links that will get you going.
This link from MapR [They don't discuss Parquet though]
The above given links will get you going. I hope this answer your query.
Thanks!
Upvotes: 1
Reputation: 174
For ORC file format , have a look at the hive documentation which has a detailed description here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
Parquet file format stores data in column form. eg: Col1 Col2 A 1 B 2 C 3
Normal data is stored as A1B2C3. Using Parquet, data is stored as ABC123. For parquet file format , have a read on https://blog.twitter.com/2013/dremel-made-simple-with-parquet
Upvotes: 1
Reputation: 414
when you have tables with very large number of columns and you tend to use specific columns frequently, RC file format would be a good choice. Rather than reading the entire row of data you would just retrieve the required columns, thus saving time. The data is divided into groups of rows, which are then divided into groups of columns.
Delimited text file is the general file format.
Upvotes: 1