kylie.zoltan
kylie.zoltan

Reputation: 469

Why use a schema registry

I just started working with Kafka and I use Protocol Buffers for the message format and I just learn about schema registry.

To give some context we are a small team with a dozen of webservices and we use Kafka to communicate between them and we store all the schemas and read/write models in a library that is later imported by each service. This way they know to serialize/deserialize a message.

But now schema registry comes into play. Why use it? Now my infrastructure becomes more complicated plus I need to update it every time I change a schema and I need to define as well the read/write models in each service like I do now using the library.

So from my point of view I only see cons mainly just complicating things so why should I use a schema registry?

Thanks

Upvotes: 4

Views: 1535

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191671

Kafka just accepts bytes, it doesn't guarantee your records have any specific set of data.

The schema registry allows you to define your messages have a base compatibility guarantee (the first version of the schema). It also can be used to validate some schema is used by the producer to prevent "other data" from being written to the topic

For example, you have a schema that describes an event like {"first_name": "Jane", "last_name": "Doe"}, but then later decide that names can actually have more than 2 parts, so you then move to a schema that can support {"name": "Jane P. Doe"}... You still need a way to deserialize old data with first_name and last_name fields to migrate to the new schema having only name. Therefore, consumers will need both schemas. The registry will hold that and encode the schema ID within each payload from the producer. After all, the initial events with the two name fields would know nothing about the "future" schema with only name.

You say your models are shared in libraries across services. You probably then have some regression testing and release cycle to publish these between services? The registry will allow you to centralize that logic.

Upvotes: 2

eik
eik

Reputation: 4590

The schema registry has been developed for the Apache Avro format and later repurposed to be one of Confluents core products.

As OneCricketeer correctly remarked you'll need the writers schema to decode Avro.

Protocol Buffers has been developed by Google and doesn't need a schema registry to be forward and backward compatible. You can read newer messages with an older schema, which is not possible with Avro.

Due to the popularity of Protocol Buffers support has been added for customers who are already using the schema registry.

But you don't need the Confluent Schema registry when using Protocol Buffers - it complicates your infrastructure if you not already using it and provides no benefits in this case.

Upvotes: 0

Related Questions