Reputation: 155
I am designing a prototype realtime monitor for processing fairly large amounts (>30G/day) of streaming numeric data. I would like to write this in Clojure, as the language seems to be well suited to the kind of "Observer + state machine" system that this will probably end up as.
The two main candidates I have found for a framework are Lamina and Storm. There is also Riemann and Pulse, but the former seems to be more of a full solution rather than a framework, and I'd rather not commit to a final design yet; Pulse's repo looks a little unmaintained?
What I would like to know is; what kinds of data- and work flow are these two projects optimised for? Storm seems to be more mature, but Lamina seems more composable and "Clojureic" (my background is Python, so I tend to rate this highly).
What I've found from reading online:
Storm seems to be Big Data(stream) focussed, the core is straight Java with a Clojure DSL. It appears to have pre=built handlers for a number of existing data sources.
Lamina is more a lightweight, reusable component that does the Clojure thing of coding to abstractions, meaning it can be reused as a base for other eventing systems. The data sources need to be handled in code.
Both have a useful set of aggregation/splitting/computation library functions out of the box. Lamina's graphviz integration is a nice touch.
Upvotes: 5
Views: 808
Reputation: 92117
Storm probably isn't a bad choice, but "over 30GB per day" of numeric data isn't big data, it is tiny data. Any semi-modern computer can handle that much data easily on one node with lamina. You might want to go with Storm anyway, so that once you do get into a realm where you need more servers you can scale easily, but I imagine there's some initial friction to getting Storm set up (and some ongoing friction in maintaining the cluster), which will be wasted if you never have to scale up.
Upvotes: 8
Reputation: 8078
Lamina seems like an okay choice, but it appears to be totally lacking the killer feature of Storm--cluster computing management. A Storm cluster will take care of most of the dirty work of distributing your computation across a cluster of nodes, leaving you to just focus on your business logic so long as you fit it within the Storm framework. Lamina, from what I can see, provides a nice way to organize your computation, but then you'll have to take care of all the details of scaling that out if that's something you need.
Upvotes: 1
Reputation: 91587
Storm incorporates cluster management and handling of failed nodes in the flow because it was designed to be sort of "like Hadoop but for streaming", which from what I understand of your requirements seems to be closer to your use case.
Upvotes: 4