Reputation: 701
I need to know how Storm manages number of parallel worker in each bolt. neither IrichBolt class nor IRichSpout Class implements Runnable class. I really need to know how storm manage multithreading?
Upvotes: 3
Views: 814
Reputation: 8171
Its kinda too broad to discuss but here's something I could try to share. In very brief Spouts
or Bolts
in storm can be defined as an entity or component that actually process the data . In storm terminology they are known as tasks
(so you don't need its parent interface such as IRichSpout
to implement something like Runnable ). Now the Thread which in responsible for carrying out these tasks are called Executors
. From the doc
in Storm’s terminology "parallelism" is specifically used to describe the so-called parallelism hint, which means the initial number of executor (threads) of a component (spout or bolt)
These executors (threads) are again spawned by the worker process
. From the doc
A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology
A machine in a storm cluster may run single or multiple such worker process
for one or more topologies, and each process can run executors for specific topologies
( you can even change these executors during run time using the storm re-balancing mecanism).
For internal communication with in these workers process Storm uses various message queues backed by LMAX Disruptor . They maintain their own threads like receiver thread & sender thred for managing incoming and outgoing messages.
You can probably take look in this doc page for a better overview. And this very nice article explaining how it handles parallelism. This might help you digging further and share your findings :)
Upvotes: 5