Reputation: 1403
I know that the SELECT from a table is implemented by a TableScanOperator which basically does the job of forwarding rows. I have a use case that will become very easy to solve if I assumed that the records in a hive table are written and read in the order in which they were inserted. Is it correct that when I do a select, I will get the records from a table in the order they were inserted?
Upvotes: 0
Views: 60
Reputation: 18434
No, this is not necessarily correct. Hive makes no guarantees about which order it scans files in. In practice, each mapper reads a single block of a file in order, but because all of those mappers are potentially running in parallel, they could finish in any order and send results back in any order.
Is there some reason you wouldn't just use an "order by" clause? You can also just add a column "insert_ts" or something and set it to the current time when you insert if you don't have a column that provides a natural ordering.
Upvotes: 1