Reputation: 633
1) What happens if Nimbus fails? Can we convert some other node into a Nimbus?
2) Where is the output of topology stored? When a bolt emits a tuple, where is it stored ?
3) What happens if zookeeper fails ?
Upvotes: 1
Views: 616
Reputation: 1072
Nimbus is itself a failure-tolerant process, which means it doesn't store its state in-memory but in an external database (Zookeeper). So if Nimbus crashes (an unlikely scenario), on the next start it will resume processing just where it stopped. Nimbus usually must be setup to be monitored by an external monitoring system, such as Monit, which will check the Nimbus process state periodically and restart it if any problem occurs. I suggest you read the Storm project's wiki for further information.
Nimbus is the master node of a Storm cluster and isn't possible to have multiple Nimbus nodes. (Update: the Storm community is now (as of 5/2014) actively working on making the Nimbus daemon fault tolerant in a failover manner, by having multiple Nimbuses heartbeating each other)
The tuple is "stored" in the tuple tree, and it is passed to the next bolt in the topology execution chain as topology execution progresses. As for physical storage, tuples are probably stored in an in-memory structure and seralized as necessary to be distributed among the cluster's nodes. The complete Storm cluster's state itself is stored in Zookeeper. Storm doesn't concern itself with persisent storage of a topology or a bolt's output -- it is your job to persist the results of the processing.
Same as for Nimbus, Zookeper in a real, production Storm cluster must be configured for reliability, and for Zookeeper that means having an odd number of Zookeeper nodes running on different servers. You can find more information on configuring a Zookeeper production cluster in the Zookeper Administrator's Guide. If Zookeeper would fail (altough a highly unlikely scenario in a properly configured Zookeeper cluster) the Storm cluster wouldn't be able to continue processing, since all cluster's state is stored in Zookeeper.
Upvotes: 4
Reputation: 1072
Regarding question 1), this bug report and subsequent comment from Storm author and maintainer Nathan Marz clarifies the issue:
Storm is not designed for having topologies partially running. When you bring down the master, it is unable to reassign failed workers. We are working on Nimbus failover. Nimbus is fault-tolerant to the process restarting, which has made it fault-tolerant enough for our and most people's use cases.
Upvotes: 1