Reputation: 621
I have default 7 days of latest streaming data stored in Kafka:
log.retention.hours=168
When deploying new version of Streams application, it takes significant amount of time to process the old data before being able to actually use it.
Are there any options to make it quicker other than reducing the retention period?
What comes to my mind is that state stores shouldn't be persisted to disk until all data is processed.
Upvotes: 0
Views: 823
Reputation: 621
What I finally came up with is processing only last N hours of original data in my Streams application using filter:
myStream.filter({ (_, value) =>
val calendar = Calendar.getInstance()
calendar.add(Calendar.HOUR, -streamHours)
value.timestamp > calendar.getTimeInMillis
})
Upvotes: 0
Reputation: 107
I'm guessing you have state-stores with changelog topics in your app, and the thing that takes time is restoring the state of the app?
Upvotes: 2