Jonas m
Jonas m

Reputation: 2734

Best practice for custom statistics

I'm sitting in a situation where i have to build a statistics module which can store user related statistical informations.

Basically, all thats stored is a event identifier, a datetime object and the amount of times this event has been fired and the id of the object which is being interacted with.

Ive made similar systems before, but never anything that has to store the amount of informations as this one.

My suggestion would be a simple tabel in the database. etc. "statistics" containing the following rows

Usually, this method works fine, enabling me to store statistics about the object in a given timeframe ( inserting a new datetime every hour or 15 minutes, so the statistics will update every 15 minute )

Now, my questions are:

I hope my questions are understandable, and i'm looking forward to get wiser on this topic. best regards. Jonas

Upvotes: 2

Views: 658

Answers (1)

mhoglan
mhoglan

Reputation: 371

I believe one of the issues you are going to run into is you wanting two worlds of transactional and analytical. Which is fine in small cases, but when you start to scale, especially into realm of 500M+ records.

I would suggest separating the two, you generate events and keep track of just the event itself. You would then run analytical queries to get things such as count of events per object interaction. You could have these counts or other metric calculations aggregated into a report table periodically.

As for tracking events, you could either do that with keeping them in a table of occurrences of events, or have something before the database that is doing this tracking and it is then providing the periodic aggregations to the database. Think of the world of monitoring systems which use collect agents to generate events which go to an aggregation layer which then writes a periodic metric snapshot to an analytical area (e.g. CollectD to StatsD / Graphite to Whisper)

Disclaimer, I am an architect for InfiniDB Not sure what kind of datasource you are using, but as you grow and determine amount of history etc... you will probably face sizing problems as most people typically do when they are collecting event data or monitoring data. If you are in MySQL / MariaDB / PostegreSQL , I would suggest you check out InfiniDB (open source columnar MPP database for analytics); It is fully open source (GPLv2) and will provide the performance you need to do queries upon billions and TBs of data for answering those analytical questions.

Upvotes: 1

Related Questions