Reputation: 1938
I have two spouts and both are emitting some data
Spout A tuple-> pid, data1, data2, data3
Spout B tuple -> pid, m1, m2
I want join the data from above two spouts in bolt over "pid"
Spout A
|------------------> joinBolt ----> pid, data1, data2, data3, m1, m2
Spout B
JoinBolt will combine the data over "pid" and will emit tuple (pid, data1, data2, data3, m1, m2)
JoinBolt joinBolt = new JoinBolt()
BoltDeclarer bd = builder.setBolt("joinBoltId", joinBolt, 5);
bd.fieldsGrouping("spout1Id" "stream1", new Fields("pid"));
bd.fieldsGrouping("spout2Id", "stream2", new Fields("pid"));
If I have parallelism of 5 in JoinBolt, can I be sure that data from both spouts having same pid will land up on same instance of joinBolt.
In this case, since I have parallelism of 5, I will be having 5 instance of joinBolt (say b1, b2, b3, b4, b5). Now is it possible that pid1 from spout1 and pid1 from spout2 can go to different instance of joinBolt even if I have set fieldsGrouping on pid?
Upvotes: 0
Views: 90
Reputation: 46
If you are using fieldsGrouping on pid, for same values pid it will go to the same instance of JoinBolt. FYI, Storm added JoinBolt functionality https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/bolt/JoinBolt.java based on windowing.
Upvotes: 1