Reputation: 330
Are there any known libraries/approaches for converting ORC files to Parquet files? Otherwise I am thinking of using Spark to import an ORC into a dataframe then output into parquet file
Upvotes: 3
Views: 9713
Reputation: 2374
You mentioned using Spark for reading ORC files, creating DataFrames and then storing those DFs as Parquet Files. This is a perfectly valid and quite efficient approach!
Also depending on your preference, also your use case, you can use even Hive or Pig[may be you can throw-in Tez for a better performance here] or Java MapReduce or even NiFi/StreamSets [depending on your distribution]. This is a very straightforward implementation and you can do it whatever suits you best [or whatever you are most comfortable with :)]
Upvotes: 2
Reputation: 34
One Way of doing this is :
Step 1) First you need to create a table from ORC table with "Stored As Text" Step 2) Secondly you can create A table from previous output as "Stored As Parquet" Step 3) After that you can drop intermediate table.
Upvotes: 1