Xiang Zhang
Xiang Zhang

Reputation: 2973

Can I use schema registry to get schema when using kafka s3 sink connect?

I have a kafka topic, the value there is avro format, where the schema is stored in schema registry.

Now I want to setup a S3 Sink, following this: https://docs.confluent.io/current/connect/connect-storage-cloud/kafka-connect-s3/docs/s3_connector.html#basic-example

In the webpage, they use

schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator

And when I try to reload the generated .avro data, I found the schema is a little bit different. For example, the nested enum type became string. I can only restore a GenericRecord instead of a SpecificRecord.

Is there a way to specify a schema generator, which retrieve the schema from schema registry?

Upvotes: 0

Views: 1776

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191671

The schema is received from the registry assuming you've used

format.class=io.confluent.connect.s3.format.avro.AvroFormat

And if Connect couldn't reach the registry, it would actually fail to write the Avro records

You set up the registry configuration in the Kafka Connect worker property file, not the connector itself. (named like connect-avro.properties, or something).

And it is converted to a generic record because your specific record more than likely is not on the Connect classpath. That "extra non-schema" data Connect adds is just metadata, but you can disable that

connect.meta.data=false 

That property you mentioned is actually only used by HDFS Connect for a Hive schema, not S3 connect with Avro schemas. At least, that property is not "required" post 3.3.0, if I recall the commit that removed it

Regarding enums, yes, they're converted to strings, and it's actually an open issue that I believe has been addressed only in the latest release (Confluent 4.1)

You'll need to set this property to fix it

enhanced.avro.schema.support=true 

Upvotes: 4

Related Questions