Reputation: 1824
I have multiple (independent) kafka clusters, each with topicA
and topicB
. Each of these clusters has a kafka-connect running with the S3 sink. Other than the AWS region they're running in, the setup is identical (though they do have different volumes of messages, which isn't relevant here).
Right now, I have topics.dir
set up as s3://bucketname/{region}/{connector-name}
. However, I would like that all the topicA
data (from all regions) shares a common prefix and all the topicB
data (from all regions) shares a different common prefix. I don't actually require that I write to a separate prefix for each region, but I'm assuming having the multiple kafka-connects trying to write to the same place is a recipe for disaster on the off-chance two try to write to the same S3 path.
It occurred to me that I could tweak the TimeBasedPartitioner
to write to {region}/yyyy/mm/dd
and change topics.dir
to be s3://bucketname/{connector-name}
, so my data would land in s3://bucketname/{connector-name}/{topic}/{region}/yyyy/mm/dd
. Would this approach work? I don't see why not, though I have yet to try it. (The only obstacle I can imagine is the S3 sink requiring that topics.dir
not exist when the connector first starts, but that's not documented anywhere, nor is S3 a file system...)
Upvotes: 0
Views: 158