Reputation: 577
I'm thinking about building a web-based data logging and visualization service. The basic idea is that at some timed interval something (e.g. a sensor) reports a value (e.g. temperature) to the server. The server records this value into a database. There would be a web-based UI that allows me to view this data on a time-based graph. Ideally this graph would have various resolutions (last 30 seconds, last week, last year, etc). In a super ideal world, I would be able to zoom into the data for any point in time.
The problem is that the sensors are going to generate enormous amounts of data. For example, a sensor that reports a value every 5 seconds will generate about 18k values a day. I'm imagining a system that has thousands of sensors. Over time, this becomes lots of data.
The naive solution is to throw this data into a relational database and retrieve it in the various ways I want, but that won't scale.
The simple solution is to reduce the amount of data by performing periodic roll-ups of the data. New data might go into a table that has data points every 5 seconds. Every hour, some system pumps this data into another table that has data points every minute and the original data is deleted. This repeats for a few levels. The downside to this is that the further back in time you go, the less detailed the data is. That's probably fine. I would imagine that I would need enormous amounts of hardware to support full resolution of data over all time as compared to a system with this sort of rollup.
Is there a better way to do this? Is there an existing solution? I have to imagine this is a fairly common problem.
Upvotes: 0
Views: 208
Reputation: 46
You probably want a fixed sized database like RRDTool: http://oss.oetiker.ch/rrdtool/
Also Graphite is built on top of a similar datastore implementation: http://graphite.wikidot.com/
Upvotes: 1