Sunil
Sunil

Reputation: 1

What is meaning of schema evolution for Parquet and Avro file format in Hive

Can anyone explain meaning of schema evolution for parquet and Avro file format in Hive.

Upvotes: 0

Views: 1742

Answers (1)

Vin
Vin

Reputation: 525

Schema evolution is nothing but a term used for how to store the behaves when schema changes . Users can start with a simple schema, and gradually add more columns to the schema as needed. In this way, users may end up with multiple Parquet/Avro files with different but mutually compatible schemas.

so lets say if you have one avro/parquet file and you want to change its schema, you can rewrite that file with a new schema inside. But what if you have terabytes of avro/parquet files and you want to change their schema? Will you rewrite all of the data, every time the schema changes?

Schema evolution allows you to update the schema used to write new data, while maintaining backwards compatibility with the schema(s) of your old data. Then you can read it all together, as if all of the data has one schema. Of course there are precise rules governing the changes allowed, to maintain compatibility. Those rules are listed under Schema Resolution.

Upvotes: 3

Related Questions