Reputation: 3743
I'm looking for a "best practise" way to handle incoming time series data.
One data point consists for example of time, height, width etc. for every "tick". Is it a good idea to save n data points in-memory with a collection class and later "flush" the points to a database after reaching the limits of the collection?
Or should the data points be directly written to the database in the first place, so that my object can run queries against it?
I know that this is little information about my requirements, so the question is how fast is the data access to a database compared to a hybrid in-memory and database solution.
Say there are at most 500 data points per second to handle and the data has to be calculated somehow on every point incoming. With a pure database solution, one has to run a store query on every incoming point. I guess this is not effective, but I don't know if such a database is able to "listen" and do this fast.
A nice feature for the database would be to send the points to subcribers. Is this possible with SQL server?
Thanks, Juergen
Upvotes: 1
Views: 1074
Reputation: 1720
I would say the bigger question is how you plan on storing this in SQL. I would queue the datapoints in memory for a period of time (1 second?) and then write a single row to the database with a blob field, or nvarchar field containing all the data for that second as this will mean the database will scale further, the row could contain some summary information of what happened in this second which you could use when when performing queries on the data to reduce load when you are doing selects... Of-course this wouldn't be feasable if you want to perform direct queries on this data.
It all depends what you plan to do with the data...
Upvotes: 1
Reputation: 4662
If it is not multi-user then data points in-memory with a collection class is definitive a winner.
If it is multi-user then I would go for some sort of shared in memory data structure on server side
persists it time to time in db.
Upvotes: 1
Reputation: 29956
Putting the "sending to subscribers" requirement aside, don't get into the trap of premature optimization.
I would try the simplest solution first, which is probably just writing the data into the database as it arrives. Then run stress tests. If the performance isn't up to scratch, find the bottlenecks and optimize them out.
Turning to the "sending to subscribers" requirement, this isn't really something which relational database platforms are typically designed for (they are more about storing data and exposing it for on-demand retreival). A pub-sub type requirement is usually best solved using some kind of message bus. Perhaps take a look at something like NServiceBus.
Upvotes: 2