Reputation: 1074
I have been trying to optimise some SQL queries based on the assumption that Joining tables is more efficient than nesting queries. I am joining the same table multiple times to perform a different analysis on the data.
I have 2 tables:
transactions:
id | date_add | merchant_ id | transaction_type | amount
1 1488733332 108 add 20.00
2 1488733550 108 remove 5.00
and a calendar table which just lists dates so that I can create empty records where there are no transactions on particular days:
calendar:
id | datefield
1 2017-03-01
2 2017-03-02
3 2017-03-03
4 2017-03-04
I have many thousands of rows in the transactions table, and I'm trying to get an annual summary of total and different types of transactions per month (i.e 12 rows in total), where
result:
month | transactions | additions | redemptions
Jan 15 12 3
Feb 20 15 5
...
My initial query looks like this:
SELECT COALESCE(tr.transactions, 0) AS transactions,
COALESCE(ad.additions, 0) AS additions,
COALESCE(re.redemptions, 0) AS redemptions,
calendar.date
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date FROM calendar WHERE datefield LIKE '2017-%' GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar
LEFT JOIN (SELECT COUNT(transaction_type) as transactions, from_unixtime(date_add, '%b %Y') as date_t FROM transactions WHERE merchant_id = 108 GROUP BY from_unixtime(date_add, '%b %Y')) AS tr
ON calendar.date = tr.date_t
LEFT JOIN (SELECT COUNT(transaction_type = 'add') as additions, from_unixtime(date_add, '%b %Y') as date_a FROM transactions WHERE merchant_id = 108 AND transaction_type = 'add' GROUP BY from_unixtime(date_add, '%b %Y')) AS ad
ON calendar.date = ad.date_a
LEFT JOIN (SELECT COUNT(transaction_type = 'remove') as redemptions, from_unixtime(date_add, '%b %Y') as date_r FROM transactions WHERE merchant_id = 108 AND transaction_type = 'remove' GROUP BY from_unixtime(date_add, '%b %Y')) AS re
ON calendar.date = re.date_r
I tried optimising and cleaning it up a little, removing the nested statements and came up with this:
SELECT
DATE_FORMAT(cal.datefield, '%b %d') as date,
IFNULL(count(ct.amount),0) as transactions,
IFNULL(count(a.amount),0) as additions,
IFNULL(count(r.amount),0) as redeptions
FROM calendar as cal
LEFT JOIN transactions as ct ON cal.datefield = date(from_unixtime(ct.date_add)) && ct.merchant_id = 108
LEFT JOIN transactions as r ON r.id = ct.id && r.transaction_type = 'remove'
LEFT JOIN transactions as a ON a.id = ct.id && a.transaction_type = 'add'
WHERE cal.datefield like '2017-%'
GROUP BY month(cal.datefield)
I was surprised to see that the revised statement was about 20x slower than the original with my dataset. Have I missed some sort of logic? Is there a better way to achieve the same result with a more streamlined query, given I am joining the same table multiple times?
EDIT: So to further explain the results I'm looking for - I'd like a single row for each month of the year (12 rows) each with a column for the total transactions, total additions, and total redemptions in each month.
The first query I was getting a result in about 0.5 sec but with the second I was getting results in 9.5sec.
Upvotes: 0
Views: 63
Reputation: 31832
First I would create a derived table with timestamp ranges for every month from your calendar
table. This way a join with the transactions
table will be efficient if date_add
is indexed.
select month(c.datefield) as month,
unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield)
Join it with the transaactions
table and use conditional aggregations to get your data:
select c.month,
sum(t.amount) as transactions,
sum(case when t.transaction_type = 'add' then t.amount else 0 end) as additions,
sum(case when t.transaction_type = 'remove' then t.amount else 0 end) as redemptions
from (
select month(c.datefield) as m, date_format(c.datefield, '%b') as `month`
unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield), date_format(c.datefield, '%b')
) c
left join transactions t on t.date_add between c.ts_from and c.ts_to
where t.merchant_id = 108
group by c.m, c.month
order by c.m
Upvotes: 0
Reputation: 133410
Looking to your query You could use a single left join with case when
SELECT COALESCE(t.transactions, 0) AS transactions,
COALESCE(t.additions, 0) AS additions,
COALESCE(t.redemptions, 0) AS redemptions,
calendar.date
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date
FROM calendar
WHERE datefield LIKE '2017-%'
GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar
LEFT JOIN
( select
COUNT(transaction_type) as transactions
, sum( case when transaction_type = 'add' then 1 else 0 end ) as additions
, sum( case when transaction_type = 'remove' then 1 else 0 end ) as redemptions
, from_unixtime(date_add, '%b %Y') as date_t
FROM transactions
WHERE merchant_id = 108
GROUP BY from_unixtime(date_add, '%b %Y' ) t ON calendar.date = t.date_t
Upvotes: 3