Reputation: 1255
Suppose I have a number of avro schema definitions: (i) Event1
, (ii) Event2
, (iii) EventWrapper
.
EventWrapper
is a record with one field (named payload) which is a union of Event1
, Event2
.
I also have a single topic, and the confluent schema registry has been setup so that when a message is sent to that topic, the subject name will resolve to the EventWrapper
schema.
There is a producer and consumer which have access to the generated POJO classes for the schemas mentioned above. Everything works fine - the producer produces an EventWrapper
message and the consumer, using the KafkaAvroDeserializer
with specific.avro.reader: true
, is able to deserialize the field within the EventWrapper message just fine to the correct POJO type (Event1
or Event2
).
But now suppose I add a new event schema, Event 3
, and update EventWrapper
to v2 (which has the union updated to include: Event1 | Event2 | Event3
in it. And the producer's generated POJOs have been updated but the consumer's have not.
The producer goes and produces messages containing payloads of Event3
but the consumer has the old generated POJO definition for EventWrapper
and does not have a generated POJO for Event3
at all.
What should be the expected result when the consumer receives an EventWrapper
message that contains Event3
as the payload?
Event3
but as a org.apache.avro.generic.GenericData.Record
instead?When using io.confluent:kafka-avro-serializer:7.6.0
the consumer I have de-serializes it to a GenericData.Record
but when a previous version (7.2.x) it actually de-serializes Event3
as Event1
(which I am pretty sure is a bug).
What is the correct behaviour? Is the deserialization to a Record expected because the consumer is able to get the schema of Event3
from the schema registry - but it just can't be exposed as the Event3
POJO type?
Can I depend on the fact that the consumer always be able to deserialize messages as a GenericData.Record
?
Upvotes: 0
Views: 53
Reputation: 9408
When Avro data is serialised to binary, the schema is included in the output. When the same data is deserialised back in, the schema in the data is loaded and compared to the schema specified by the client. If the schema's don't match then there is a schema resolution process:
https://avro.apache.org/docs/1.11.1/specification/#schema-resolution
This is designed to allow code written for a newer schema to read data generated with an older schema.
Although I can't find it mentioned in the documentation, if the client is unable to resolve the two schemas, then it falls back on returning a GenericData
object, essentially a generic schema-less representation of the data.
Upvotes: 1