flybonzai
flybonzai

Reputation: 3931

How to extract subset of fields from Avro record then write out to S3 using another schema?

We have an S3 connector that reads from a topic and batches together the Avro records as-is, then writing them out to S3 as .avro files.

My use case is that I would like to have a smarter Connector that extracts a sub-set of fields, then writes them out to S3 as .avro files, but using a pre-defined schema (registered in Schema Registry) that matches the sub-set of fields I extracted.

SMT's seem like a good way to go here, but ExtractField only works on a single field (as far as I can tell). Is there an easy way to satisfy the above use case using built-in SMT's, or do I have to write a custom solution? This seems like something that would be commonly needed.

Upvotes: 0

Views: 87

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191743

SMTs are meant to be simple... The common solution for this is to use a stream processor (KStreams, ksqlDB, Flink, Spark, etc) to write to a new topic with the subset of fields that you want (using a new schema, if needed), then create the sink connector from that.

Upvotes: 0

Related Questions