Reputation: 3931
We have an S3 connector that reads from a topic and batches together the Avro records as-is, then writing them out to S3 as .avro files.
My use case is that I would like to have a smarter Connector that extracts a sub-set of fields, then writes them out to S3 as .avro files, but using a pre-defined schema (registered in Schema Registry) that matches the sub-set of fields I extracted.
SMT's seem like a good way to go here, but ExtractField
only works on a single field (as far as I can tell). Is there an easy way to satisfy the above use case using built-in SMT's, or do I have to write a custom solution? This seems like something that would be commonly needed.
Upvotes: 0
Views: 87
Reputation: 191743
SMTs are meant to be simple... The common solution for this is to use a stream processor (KStreams, ksqlDB, Flink, Spark, etc) to write to a new topic with the subset of fields that you want (using a new schema, if needed), then create the sink connector from that.
Upvotes: 0