Steven M
Steven M

Reputation: 574

MySQL - Most efficient way to count relations

I have two tables in MySQL. One with accounts (40 Million records) and one with transactions (300 Million records). There are around 200k transactions added daily. They have a 1:n relation:

+-----------------+
| Accounts        |
+----+------------+
| ID | account_no |
+----+------------+

and

+--------------------------------+
| Transactions                   |
+----+------------+--------+-...-+
| ID | account_no | amount | ... |
+----+------------+--------+-...-+

I need to write a SQL Query to create a list of accounts ordered by amount of transactions. So something like this comes to mind:

SELECT a.account_no , COUNT(a.account_no) tx_count FROM accounts a
INNER JOIN transactions tx ON a.account_no = tx.account_no
GROUP BY tx.account_no ORDER BY tx_count DESC LIMIT 1000;

(Link to SQLfiddle)

My Question: What is the most efficient way to achieve this, given that I have to deal with tens and hundreds of millions of records?

[Note: The field account_no is of course indexed]

Upvotes: 1

Views: 1193

Answers (2)

O. Jones
O. Jones

Reputation: 108816

You're seeking efficiency. So stick to one table, skipping the JOIN. And, use COUNT(*). It counts raw rows. COUNT(something_else) needs to check something_else values for null before counting them. (http://sqlfiddle.com/#!9/cd6bb2/12/0)

 SELECT account_no, 
        COUNT(*) transaction_count
   FROM transactions
  GROUP BY account_no
  ORDER BY COUNT(*) DESC
  LIMIT 1000

It happens that the MyISAM access method can satisfy this query from the index directly. InnoDB will have to scan the index.

If your production report needs all the accounts summarized rather than just the top 1000, then you might consider omitting ORDER BY COUNT(*). That will save some time on the dbms.

The guys at Percona have lots of geeked-out explanations of how queries exploit indexes. For example: https://www.percona.com/blog/2012/11/23/full-table-scan-vs-full-index-scan-performance/

Upvotes: 1

Mjh
Mjh

Reputation: 2956

To gain performance, you must do the least amount of work possible.

Performing a COUNT(*) doesn't fit that mantra.

Therefore, if you're after efficient way of obtaining the number you're after - don't use COUNT(*) at all. Simply create an integer field that contains the count number for particular record in Account and use triggers to handle incrementing/decrementing. It might sound that triggers will incur a performance penalty - they won't and the whole process becomes trivial while trying to obtain transaction count per account.

You can index the trx_count field and your queries become:

SELECT * FROM Accounts ORDER BY trx_count DESC;

The table alteration:

ALTER TABLE Accounts add trx_count int unsigned  not null default '0';

After insert trigger on Transactions

DELIMITER $$
CREATE

    TRIGGER `Transactions_after_insert` AFTER INSERT ON `Transactions` 
    FOR EACH ROW BEGIN
        UPDATE Accounts SET trx_count = trx_count + 1 WHERE id = new.account_no;
    END;
$$

DELIMITER ;

After delete trigger on Transactions

DELIMITER $$
CREATE

    TRIGGER `Transactions_after_delete` AFTER DELETE ON `Transactions` 
    FOR EACH ROW BEGIN
             UPDATE Accounts SET trx_count = trx_count - 1 WHERE id = old.account_no;
    END;
$$

DELIMITER ;

Upvotes: 1

Related Questions