Reputation: 574
I have two tables in MySQL. One with accounts
(40 Million records) and one with transactions
(300 Million records). There are around 200k transactions added daily. They have a 1:n relation:
+-----------------+
| Accounts |
+----+------------+
| ID | account_no |
+----+------------+
and
+--------------------------------+
| Transactions |
+----+------------+--------+-...-+
| ID | account_no | amount | ... |
+----+------------+--------+-...-+
I need to write a SQL Query to create a list of accounts
ordered by amount of transactions. So something like this comes to mind:
SELECT a.account_no , COUNT(a.account_no) tx_count FROM accounts a
INNER JOIN transactions tx ON a.account_no = tx.account_no
GROUP BY tx.account_no ORDER BY tx_count DESC LIMIT 1000;
My Question: What is the most efficient way to achieve this, given that I have to deal with tens and hundreds of millions of records?
[Note: The field account_no
is of course indexed]
Upvotes: 1
Views: 1193
Reputation: 108816
You're seeking efficiency. So stick to one table, skipping the JOIN. And, use COUNT(*)
. It counts raw rows. COUNT(something_else)
needs to check something_else
values for null before counting them. (http://sqlfiddle.com/#!9/cd6bb2/12/0)
SELECT account_no,
COUNT(*) transaction_count
FROM transactions
GROUP BY account_no
ORDER BY COUNT(*) DESC
LIMIT 1000
It happens that the MyISAM access method can satisfy this query from the index directly. InnoDB will have to scan the index.
If your production report needs all the accounts summarized rather than just the top 1000, then you might consider omitting ORDER BY COUNT(*)
. That will save some time on the dbms.
The guys at Percona have lots of geeked-out explanations of how queries exploit indexes. For example: https://www.percona.com/blog/2012/11/23/full-table-scan-vs-full-index-scan-performance/
Upvotes: 1
Reputation: 2956
To gain performance, you must do the least amount of work possible.
Performing a COUNT(*)
doesn't fit that mantra.
Therefore, if you're after efficient way of obtaining the number you're after - don't use COUNT(*)
at all. Simply create an integer field that contains the count number for particular record in Account
and use triggers to handle incrementing/decrementing. It might sound that triggers will incur a performance penalty - they won't and the whole process becomes trivial while trying to obtain transaction count per account.
You can index the trx_count
field and your queries become:
SELECT * FROM Accounts ORDER BY trx_count DESC;
ALTER TABLE Accounts add trx_count int unsigned not null default '0';
Transactions
DELIMITER $$
CREATE
TRIGGER `Transactions_after_insert` AFTER INSERT ON `Transactions`
FOR EACH ROW BEGIN
UPDATE Accounts SET trx_count = trx_count + 1 WHERE id = new.account_no;
END;
$$
DELIMITER ;
Transactions
DELIMITER $$
CREATE
TRIGGER `Transactions_after_delete` AFTER DELETE ON `Transactions`
FOR EACH ROW BEGIN
UPDATE Accounts SET trx_count = trx_count - 1 WHERE id = old.account_no;
END;
$$
DELIMITER ;
Upvotes: 1