Reputation: 581
I have a hierarchy of processes called "monitor_node". Each of these monitor_nodes is supervised by one supervisor.
Now, each of these nodes may have a complex inner structure. Meaning, it may (or may not) have some subprocesses that are needed for it to operate properly. Example: process sending keep-alive messages. So far I have been using plain spawn_link to create these "internal" processes.
However, I have realized that spawning them in init function of monitor_node (which is being supervised) sometimes causes this function to fail (and therefore whole supervisor tree fails). My question is: would it be a good solution to attach these internal processes to supervisor tree? I am thinking about changing monitor_node to a supervisor that supervises it's internal processes.
My doubts are:
I would have to supervise quite significant number of very small processes. I am not sure if this is a good practice.
I may not know in advance that given "internal" process is a simple process or has some internal structure (also spawns other processes). If the latter is the case then I probably should attach these "inner-inner" processes to the supervisor tree.
I hope I have not confused you too much. Looking forward for an answer.
EDIT:
A very similar (if not the same) problem is discusses here (3rd post). The solution given is pretty much the same as the one that I GIVE CRAP ANSWERS gave.
Upvotes: 0
Views: 865
Reputation: 18879
There is a trick here, which includes the use of two supervisors. Your tree goes like:
main_sup -> worker
main_sup -> attached_pool_sup
attached_pool_sup -> workers
main sup is one_for_all
, so if the worker or the pool supervisor dies, then the tree is done for and killed off. The pool supervisor is a simple_one_for_one
which are suitable for having hundreds or thousands of workers.
Don't do too much work in your init callback. The supervisor will wait until the init completes and you can set a timeout (which you can increase in your case) if it takes longer than normal.
A trick is to quickly timeout (return with a timeout of 0 from init) and then handle additional setup in the handle_info
timeout callback. That way you won't be stopping up the main supervisor. Beware of races here!
Upvotes: 2