subhash padala
subhash padala

Reputation: 11

Storing avro data in ORC format in HDFS with out using HIVE

Am comparing storing avro data in to ORC and Parquet format, i got success in storing Avro data into parquet using "com.twitter" % "parquet-avro" % "1.6.0" , but unable to find any information or API to store the avro data in ORC format.

Is that ORC is tightly coupled with Hive only ?

Thanks subahsh

Upvotes: 1

Views: 2305

Answers (1)

Ewan Leith
Ewan Leith

Reputation: 1665

You haven't said your using Spark, but the question is tagged it, so I assume you are.

The ORC file format is currently heavily tied to the HiveContext in Spark (and I think only available in 1.4 and up), but if you create a hive context, you should be able to write dataframes to ORC files in the same was you can with Parquet, for example:

import org.apache.spark.sql._
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df = sqlContext.read.avro(("/input/path")
df.write.format("orc").save("/path/to/use")

If you're readingthe avro data via the Spark dataframes API, then that's all you should need, but there's more details on the Hortonworks blog

Upvotes: 2

Related Questions