Yohn
Yohn

Reputation: 580

How does Storm manage sharing data on cluster mode?

recently I'm developing some timing tools for storm topology, but I still have some questions about sharing data in storm cluster:

  1. If a component(spout/bolt) is configured with more than one executors per worker, say the worker number is one, the parallelism_hint of the component is 3 and the task number uses default setting(i.e. 1), does that mean there are 3 instances of the component in the worker? If not, should the field of the component be used in a synchronized block?

  2. If an additional thread named "athread" is created in a component(within prepare() or open() method), how many "athread" instances are there in the storm cluster?

  3. As Understanding the Parallelism of a Storm Topology says, a worker is a separate process, and a worker process executes a subset of a topology. Does that mean global variables (such as public static fields or other static variables) of the topology can only be shared in one worker?

  4. If a spout's parallelism_hint is configured greater than 1, and there is a Utils.sleep(1000) sentence in nextTuple() method, does that mean the number of emitted tuples of the spout is equal to the executors'(threads) number of the spout every second?

Thanks very much.

Upvotes: 3

Views: 1194

Answers (1)

user2720864
user2720864

Reputation: 8171

1) Setting parallelism hint = 3 will ask storm to allocate 3 executors ( Threads ) and will create 3 tasks by default( Please note if you do not explicitly configure the number of tasks storm will run one tasks per executor by default) for that component. If you set the no task = 1 , then three threads will be operating on the same component instance.

2) Assuming you are running with default settings ( 1 executor & 1 task/component) it will create one single instance of that thread object as the prepare/open method will only be called once.

3) Static variables will be shared by all the instances of a component within a given worker.

4) Not sure what exactly you mean, if you are running with multiple executors and one tasks then when Thread-A is sleeping another Thread-B might be processing the tuple

Upvotes: 2

Related Questions