SQL Database Design - Cache Tables?

Question

What's a common/best practice for database design when it comes to improving performance on count(1) queries? (I'm currently using SQLite)

I've normalized my data, it exists on multiple tables, and for simple things I want to do on a single table with a good index -- queries are acceptably quick for my purposes.

eg:

SELECT count(1) from actions where type='3' and area='5' and employee='2533';

But when I start getting into multiple table queries, things get too slow (> 1 second).

SELECT count(1) 
  from
  (SELECT SID from actions 
      where type='3' and employee='2533' 
   INTERSECT 
     SELECT SID from transactions where currency='USD') x;

How should I cache my results? What is a good design? My natural reaction is to add a table solely for storing rows of cached results per employee?

StuartLC · Accepted Answer

Edit

Design patterns like Command Query Responsibility Segregation (CQRS) specifically aim to improve the read side performance of data access, often in distributed systems and at enterprise scale.

Commands are issued to indicate 'transactions' or 'change / updates' to data
When a system processes these commands (e.g. by updating database tables), the new state of the affected objects is 'broadcast'
Systems which are interested (such as a user interface or a queryable REST API) will then subscribe to these data changes, and then 'shape' the updated data to their specific needs
This updated data is then cached (often called a 'Read Store')

Another pattern commonly associated with CQRS is "Event Sourcing", which stores, and then allows 'replay' of Commands, for various use cases.

The above may be overkill for your scenario, but a very simple implementation of caching at an internal app level, could be via a Sqllite Trigger

Assuming that there are many more 'reads' than writes to your actions or transactions tables,

You could create a cache tables specifically for "SID for actions by type by employee' and one for "SID for transactions by Currency", or even combine the two (depends on what other scenarios you have for querying)
You would then need to update these cache table(s) every time the underlying action or transactions tables update. One cheap (and nasty) way would be to provide an INSERT, UPDATE and DELETE trigger on the action and transactions table, which would then update the appropriate cache table(s).
Your 'query' interface would now primarily interact with the cache tables, using the 'derived' data (such as the counts).
You may still however need to handle cache miss scenarios, such as the initial 'seed' of these cache tables, or if the cache tables need to be regenerated.

In addition to a local relational database like SqlLite, NoSql databases like MongoDb, Cassandra and Redis are frequently used as alternatives to read side caching in read-heavy environments (depending on the type and format of data that you need to cache). You would however need to handle alternative to synchronize data from your 'master' (e.g. SQLLite) database to these cache read stores - triggers obviously won't cut it here.

Original Answer

If you are 100% sure that you are always repeating exactly the same query for the same customer, sure, persist the result.

However, in most other instances, RDBMS usually handles caching just fine.

The INTERSECT with the query

SELECT SID from transactions where currency='USD'

Could be problematic if there are a large number of transaction records with USD.

Possibly you could replace this with a join?

SELECT count(1) from 
(
    SELECT t.[SID] 
    from
        transactions as t
        inner join
        (
            SELECT SID from actions where type='3' and employee='2533'
        ) as a
        on t.SID = a.SID
    where t.currency= 'USD'
) as a

You might just check your indexes however:

For

SELECT count(1) from actions where type='3' and area='5' and employee='2533'
SELECT SID from actions where type='3' and employee='2533'

An index on Actions(Employee, Type) or Actions(Employee, Type, Area) would make sense (assuming Employee has highest selectivity, and depending on the selectivity of Type and Area).

You can also compare this to an index on Actions(Employee, Type, Area, SID) as a covering index for your second query.

And for the join above, you need an index on Transactions(SID, Currency)

SQL Database Design - Cache Tables?

Answers (1)

Related Questions