Reputation: 2973
I have a kafka topic, the value there is avro format, where the schema is stored in schema registry.
Now I want to setup a S3 Sink, following this: https://docs.confluent.io/current/connect/connect-storage-cloud/kafka-connect-s3/docs/s3_connector.html#basic-example
In the webpage, they use
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
And when I try to reload the generated .avro data, I found the schema is a little bit different. For example, the nested enum type became string. I can only restore a GenericRecord
instead of a SpecificRecord
.
Is there a way to specify a schema generator, which retrieve the schema from schema registry?
Upvotes: 0
Views: 1776
Reputation: 191671
The schema is received from the registry assuming you've used
format.class=io.confluent.connect.s3.format.avro.AvroFormat
And if Connect couldn't reach the registry, it would actually fail to write the Avro records
You set up the registry configuration in the Kafka Connect worker property file, not the connector itself. (named like connect-avro.properties, or something).
And it is converted to a generic record because your specific record more than likely is not on the Connect classpath. That "extra non-schema" data Connect adds is just metadata, but you can disable that
connect.meta.data=false
That property you mentioned is actually only used by HDFS Connect for a Hive schema, not S3 connect with Avro schemas. At least, that property is not "required" post 3.3.0, if I recall the commit that removed it
Regarding enums, yes, they're converted to strings, and it's actually an open issue that I believe has been addressed only in the latest release (Confluent 4.1)
You'll need to set this property to fix it
enhanced.avro.schema.support=true
Upvotes: 4