Understanding some concepts and Hazelcast Jet integrated with Kafka

Question

I'm trying to map some concepts between Spark Structured Streaming and Hazelcast Jet, and understand another subjects as well.

Q1 - In the Spark, each Kafka partition will become a partition inside spark, then they will be processed by individual tasks in parallel. I think that I've read somewhere that Hazelcast Jet will merge all the messages from kafka regardless the group.id and topic partitions, is that correct ?

Q2 - How do we grow the number of "consumers" in a Jet program to increase the throughput consuming from kafka ? In Spark I guess we only need to grow the number of topic partitions in order to a new spark task be assigned for the new partition.

Q3 - If the Q1 above is true, is it possible avoid that merge and distribute the kafka partitions to be processed in parallel ? Once the messages will be already grouped and ordered in a kafka partition, having all the messages merged imply extra processing to re-partition and sort the messages again.

Q4 - How is defined the number of each vertex ? I mean, in the word count example we have the tokenizer and accumulator, how Jet will decide/divide the number of processors to creates instances of tokenizer's and accumulator's ?

Understanding some concepts and Hazelcast Jet integrated with Kafka

Answers (1)

Related Questions