Sachin
Sachin

Reputation: 3544

How to append ORC file

We have a requirement where we need to appednd ORC files. I tried to google it but no result. Also org.apache.hadoop.hive.ql.io.orc.WriterImpl of ORC do not have the append API. Is there anyway to append the ORC files? (More specifically using JAVA)

Upvotes: 5

Views: 2621

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

ORC data files are subdivised in independent stripes; each stripe be created in a single atomic step. See the official documentation for details.

I don't believe you can directly append to an existing file on-the-fly. That would mean leaving a corrupt stripe (hence a corrupt file) in case of a job crash while writing.

But you can

  • create a new ORC data file (which will contain 1..N stripes depending on actual data volume vs. orc.stripe.size property) per reducer
  • then "concatenate" these data files -- and existing file(s) -- using Hive V0.14 and above

Upvotes: 4

Related Questions