vab2048
vab2048

Reputation: 1255

What is the expected result of deserializing an avro message when you do not have the Java POJO?

Suppose I have a number of avro schema definitions: (i) Event1, (ii) Event2, (iii) EventWrapper.

EventWrapper is a record with one field (named payload) which is a union of Event1, Event2.

I also have a single topic, and the confluent schema registry has been setup so that when a message is sent to that topic, the subject name will resolve to the EventWrapper schema.

There is a producer and consumer which have access to the generated POJO classes for the schemas mentioned above. Everything works fine - the producer produces an EventWrapper message and the consumer, using the KafkaAvroDeserializer with specific.avro.reader: true, is able to deserialize the field within the EventWrapper message just fine to the correct POJO type (Event1 or Event2).

But now suppose I add a new event schema, Event 3, and update EventWrapper to v2 (which has the union updated to include: Event1 | Event2 | Event3 in it. And the producer's generated POJOs have been updated but the consumer's have not.

The producer goes and produces messages containing payloads of Event3 but the consumer has the old generated POJO definition for EventWrapper and does not have a generated POJO for Event3 at all.

What should be the expected result when the consumer receives an EventWrapper message that contains Event3 as the payload?

  1. Should the consumer have a de-serialization error?
  2. Should the consumer be able to deserialize Event3 but as a org.apache.avro.generic.GenericData.Record instead?

When using io.confluent:kafka-avro-serializer:7.6.0 the consumer I have de-serializes it to a GenericData.Record but when a previous version (7.2.x) it actually de-serializes Event3 as Event1 (which I am pretty sure is a bug).

What is the correct behaviour? Is the deserialization to a Record expected because the consumer is able to get the schema of Event3 from the schema registry - but it just can't be exposed as the Event3 POJO type?

Can I depend on the fact that the consumer always be able to deserialize messages as a GenericData.Record?

Upvotes: 0

Views: 53

Answers (1)

jon hanson
jon hanson

Reputation: 9408

When Avro data is serialised to binary, the schema is included in the output. When the same data is deserialised back in, the schema in the data is loaded and compared to the schema specified by the client. If the schema's don't match then there is a schema resolution process:

https://avro.apache.org/docs/1.11.1/specification/#schema-resolution

This is designed to allow code written for a newer schema to read data generated with an older schema.

Although I can't find it mentioned in the documentation, if the client is unable to resolve the two schemas, then it falls back on returning a GenericData object, essentially a generic schema-less representation of the data.

Upvotes: 1

Related Questions