Reputation: 14913
I'm currently assessing Apache Kafka for use in our technology stack. One thing which may become critical is a contractual or legal requirement to be able to audit the system's behaviour, retaining this audit information for as much as a year.
Given the volume of data we process we will, most likely, need to cold-store this rather than simply partitioning the data and setting a long retention period. Cold-store here means storing in Amazon S3 or multiple locally held TB HDDs.
We could of course set up a logger against every topic. Yes.
But this feels like it should be a solved problem to which I just can't find a documented solution.
What's the best way of archiving old data from Apache Kafka rather than simply discarding it?
Upvotes: 1
Views: 1460
Reputation: 32100
You could use the S3 sink connector to stream the data to S3, and then set the retention period on your topics as required to age-out the data.
Upvotes: 2