Atom
Atom

Reputation: 616

Kafka KStream to KStream join | restart performance

I'm planning on joining two topics as KStreams over a long window (~1week). Assuming there will be hundreds of millions of records accumulated in this window, how long will the joining consumer take to restart? I'm asking this because I was unable to find the information regarding how many of the records from the window are stored in the consumer cache.

Upvotes: 0

Views: 196

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62350

By default, data that is buffered in a window is stored in RocksDB, ie, local disk. Hence, on restart (on the same machine) nothing needs to be re-loaded as the data is already available.

If you restart on a different machine, the whole content of the store would need to be re-read from a Kafka topic (that backs up the store to guarantee fault-tolerance). How long this takes depends on many factors and it's hard to estimate. You can register a "restore callback" though to monitor the restore process. This should give you some way to run some experiments to get insight how long it may take.

Upvotes: 2

Related Questions