Kafka Streams DSL Cache - Handle Tombstones

Question

I need to use the Kafka Streams DSL cache to reduce the amount of write volume to downstream processors. However, our app processes tombstones, which introduces a complication. For example, given the following records for a single key, K1:

The DSL cache may only emit the final record of:

With the DSL cache turned off, of course, it would emit all of the intermediate records:

Everything is working as expected so far. But, with tombstones, the raw sequence becomes:

So depending on when the cache is flushed, we may never see the final count. e.g.

       | cached
       | flushed
       | cached
     | deleted

would mean is flushed, but never . The semantics I'm trying to achieve involves flushing the latest record for a given key in the cache whenever a tombstone is received for that key.

       | cached
       | flushed
       | cached
     | emit the latest record (``), then delete.

I have not been able to do this with the DSL, and the Processor API doesn't expose the underlying cache, so can't do it there either. I'm thinking about implementing a custom in-memory cache and using that with the Processor API, but it gets complicated because it seems like there could be data loss if the app is shutdown ungracefully (e.g. SIGKILL). Not sure how the DSL cache handles ungraceful shutdowns either (e.g. maybe there's dataloss) so maybe the implementation I'm thinking of can be modeled after the DSL cache.

Anyways, am I over thinking this problem? Is there a way to flush the latest record from the DSL cache when a tombstone is received, instead of implementing a custom cache?

Kafka Streams DSL Cache - Handle Tombstones

Answers (1)

Related Questions