Reputation: 11
Am comparing storing avro data in to ORC and Parquet format, i got success in storing Avro data into parquet using "com.twitter" % "parquet-avro" % "1.6.0" , but unable to find any information or API to store the avro data in ORC format.
Is that ORC is tightly coupled with Hive only ?
Thanks subahsh
Upvotes: 1
Views: 2305
Reputation: 1665
You haven't said your using Spark, but the question is tagged it, so I assume you are.
The ORC file format is currently heavily tied to the HiveContext in Spark (and I think only available in 1.4 and up), but if you create a hive context, you should be able to write dataframes to ORC files in the same was you can with Parquet, for example:
import org.apache.spark.sql._
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df = sqlContext.read.avro(("/input/path")
df.write.format("orc").save("/path/to/use")
If you're readingthe avro data via the Spark dataframes API, then that's all you should need, but there's more details on the Hortonworks blog
Upvotes: 2