Sahil Sharma
Sahil Sharma

Reputation: 4247

database: aggregation vs storing pre computed data for lookup in large data store?

We have a table with 2 million records and a new row insertion at 30 records per minute.

The table structure is: |Complaint | Mobile | RequestDateTime| . We get complain, mobile of user and current date and we save it in this table. We can have multiple rows with same mobile in this table.

We only have to save the new record if we have got less than 12 requests from same "Mobile" in last 30 days. We have three approaches to count number of records with particular mobile here:

  1. Every time before inserting new records, first run aggregation query in a database to check if we already have 12 records with that number. If yes, don't insert a new one. The problem here is the performance of the query.
  2. OR store this data after day end i.e. rows with the same mobile number in last 29 days in a table and run aggregate only on the current day and add both to check if its less than 12.

OR can someone suggest better solution for this?

Upvotes: 0

Views: 752

Answers (2)

Rick James
Rick James

Reputation: 142560

I'll make the argument for checking on the fly...

This is the query, correct?

SELECT COUNT(*)
    FROM tbl
    WHERE num = 1234
      AND datetime >= NOW() - INTERVAL 30 DAY; 
  • You have less than one INSERT per second.
  • The query should never return more than 12, correct?
  • With INDEX(num, date), that SELECT is very fast.
  • Now you are at 60 queries/minute (1/sec). (One SELECT + (usually) one INSERT) Things don't get dicey until over 100/sec.
  • BTree indexing is very efficient, and is not affected much by the table size. So, 2M rows is not a factor. Nor would 2 billion.
  • I suggest that this solution is simpler than the other choices. KISS.

(I'm a big proponent of Summary Tables, but I can't justify it here.)

Upvotes: 1

Gyanendra Dwivedi
Gyanendra Dwivedi

Reputation: 5557

Another better way could be :

1) At each EOD, maintain a negative list table of mobile (no of complaints > 12).

2) If record is part of that list, you would never process that. Even you could cache those record in memory (depending on technologies you are using) and avoid DB call all together.

3) Optionally, you could schedule a batch (say every hour), which could update the negative list with additional items and the cache.

EDIT based on comments

4) If you do not find a record in negative list table, proceed with your normal logic of counting.

5) You can extend LRU (Least recent used) cache strategy to keep count in-memory for frequent records, to avoid making database call. Make sure to flush a records into negative list as soon as 12 records are inserted in main DB.

Upvotes: 1

Related Questions