Achilleus
Achilleus

Reputation: 1944

Kafka Avro Schema evolution

I am trying to learn more about the Avro schemas which we use for our Kafka topics and I am relatively new to this.

I was wondering is there a way to evolve schemas in a particular situation. We update our schema with a new field that can't be null or any default values because these new fields are identifiers. The workaround to solve this is to create new topics, but is there a better way to evolve existing schemas?

Upvotes: 1

Views: 2350

Answers (1)

Treziac
Treziac

Reputation: 3264

There are four possible compatibility in topic: - Forward: a client which await the old version of the schema can read the new version - Backward: a client which await the new version of the schema can read the old version - Both: both above - None: none of above

Consider that there are some times where some producer will produce old and new data, and consumer will except new or old data.

How would behave clients in your case?

  • adding a field is always forward compatible (old clients just drop the new field)
  • it is backward compatible only if you specify a default value

Also, this is only true if you are planning to convert data to a specific schema (with the corresponing POCO for example) - if you just convert it to json and make custom treatment, you could have a new client process both schema.

So two possibe ways for me to wrte to same topic:

  • you set a default value. You may be misunderstanding default values, it doesn't mean a default value will be written, but (quoting avro specs)

    A default value for this field, used when reading instances that lack this field (optional)

For example, if you previously had a "name" and want to add "surname", you can set "surname" default as "NC" (or empty), as you may have done in a database.

  • You set your compatibility default to none (or forward), so that you can update your schema (as by default, comptibiliaty is backward). In this case, client awaiting the new schema won't be able to process old data. But it could fit your usage if you only process incoming data (change compatibility, update all your producer (so that only new data will arrive), then your clients awaiting the new schema - remember to set compatibility back to backward or the compatibility your really want

I would go with option 1.

Upvotes: 6

Related Questions