tooptoop4
tooptoop4

Reputation: 330

Convert ORC file to Parquet file

Are there any known libraries/approaches for converting ORC files to Parquet files? Otherwise I am thinking of using Spark to import an ORC into a dataframe then output into parquet file

Upvotes: 3

Views: 9713

Answers (2)

Rahul
Rahul

Reputation: 2374

You mentioned using Spark for reading ORC files, creating DataFrames and then storing those DFs as Parquet Files. This is a perfectly valid and quite efficient approach!

Also depending on your preference, also your use case, you can use even Hive or Pig[may be you can throw-in Tez for a better performance here] or Java MapReduce or even NiFi/StreamSets [depending on your distribution]. This is a very straightforward implementation and you can do it whatever suits you best [or whatever you are most comfortable with :)]

Upvotes: 2

gaurav Sharma
gaurav Sharma

Reputation: 34

One Way of doing this is :

Step 1) First you need to create a table from ORC table with "Stored As Text" Step 2) Secondly you can create A table from previous output as "Stored As Parquet" Step 3) After that you can drop intermediate table.

Upvotes: 1

Related Questions