flybonzai
flybonzai

Reputation: 3931

How does Confluent's Schema Registry assign schema id's?

I'm using the CachedSchemaRegistryClient and the register method, which takes a subject and Avro schema. I'm running these against the 5.2.1 confluent docker images, and when I register the schema I get back behavior that I find strange.

The first schema I register returns an id of 81 (backed up by using the schema registry REST api to check that this schema is tied to this id), and then the second schema returns and id of 121.

Since this behavior is unexpected and I have been unable to find an answer via Google, I'm curious if there is a hashing strategy or something similar to assign schema ids, I would expect it to start at 1 and increment.

Upvotes: 1

Views: 2422

Answers (1)

Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39790

Confluent Documentation explains how unique IDs are assigned to schemas:

Schema Registry is a distributed storage layer for Avro Schemas which uses Kafka as its underlying storage mechanism. Some key design decisions:

  • Assigns globally unique ID to each registered schema. Allocated IDs are guaranteed to be monotonically increasing but not necessarily consecutive.
  • Kafka provides the durable backend, and functions as a write-ahead changelog for the state of Schema Registry and the schemas it contains.
  • Schema Registry is designed to be distributed, with single-primary architecture, and ZooKeeper/Kafka coordinates primary election (based on the configuration).

Also,

Schema ID Allocation

Schema ID allocation always happens in the primary node and Schema IDs are always monotonically increasing.

If you are using Kafka primary election, the Schema ID is always based off the last ID that was written to Kafka store. During a primary re-election, batch allocation happens only after the new primary has caught up with all the records in the store <kafkastore.topic>.

If you are using ZooKeeper primary election, /<schema.registry.zk.namespace>/schema_id_counter path stores the upper bound on the current ID batch, and new batch allocation is triggered by both primary election and exhaustion of the current batch. This batch allocation helps guard against potential zombie-primary scenarios, (for example, if the previous primary had a GC pause that lasted longer than the ZooKeeper timeout, triggering primary reelection).

Upvotes: 3

Related Questions