Reputation: 65
I have read several articles and books about stream processing and they all assume all past data will be available in the stream when doing the event sourcing.
Therefore, we can do real time queries by building tables like KSQL.
But the reality is that the data in the stream is not always there from the past to the present. In the case of Kinesis, the default is to keep data for 24 hours, up to seven days.
There are two scenarios that bother me.
Upvotes: 0
Views: 165
Reputation: 968
There's two general techniques that I know for EDAs
1/ Take regular snapshots from a reliable set of sources (maybe even the publishers or by having a dedicated consumer that takes snapshots). You then merge them into your internal state as a consumer to come into sync. This option works as long as you're okay with being out-of-sync at most for 1 day or whatever is the typical duration of the snapshots.
2/ The second option is having a archive of event-store. You replay from the event-store past messages to the relevant consumers.
In either case, you'll need to solve for deduping / merging / filtering data based on your unique business case.
Upvotes: 1