Reputation: 245
We are planning to run kafka streams application distributed in two machines. Each instance stores its Ktable data on its own machine. The challenge we face here is,
Proposed Solution: We are thinking to have a shared location (state.dir) for these two instances.So that these two instances will store the Ktable data at same directory and the idea is to get all data from a single instance without interactive query by just calling,
final ReadOnlyKeyValueStore<Key, Result> allDataFromTwoInstance =
streams.store("result",
QueryableStoreTypes.<Key, Result>keyValueStore())
KeyValueIterator<Key, ReconResult> iterator = allDataFromTwoInstance.all();
while (iterator.hasNext()) {
//append to excel report
}
Question: Will the above solution work without any issues? If not, is there any alternative solution for this?
Please suggest. Thanks in Advance
Upvotes: 1
Views: 1249
Reputation: 62285
This will not work. Even if you have a shared state.dir
, each instance only loads its own share/shard of the data and is not aware about the other data.
I think you should use GlobalKTable to get a full local copy of the data.
Upvotes: 3
Reputation: 4314
GlobalKTable is the most natural first choice, but it means each node where the global table is defined contains the entire dataset.
The other alternative that comes to mind is indeed to stream the data between the nodes on demand. This makes sense especially if creating the report is an infrequent operation or when the dataset cannot fit a single node. Basically, you can follow the documentation guidelines for querying remote Kafka Streams nodes here:
and for RPC use a framework that supports streaming, e.g. akka-http.
Server-side streaming:
http://doc.akka.io/docs/akka-http/current/java/http/routing-dsl/source-streaming-support.html
Consuming a streaming response:
Upvotes: 3