contool
contool

Reputation: 1074

Optimise MySQL - JOIN vs Nested query

I have been trying to optimise some SQL queries based on the assumption that Joining tables is more efficient than nesting queries. I am joining the same table multiple times to perform a different analysis on the data.

I have 2 tables:

transactions:

id    |   date_add    |   merchant_ id    | transaction_type      |     amount
1         1488733332          108                  add                     20.00
2         1488733550          108                 remove                   5.00

and a calendar table which just lists dates so that I can create empty records where there are no transactions on particular days:

calendar:

id     |    datefield
1           2017-03-01
2           2017-03-02
3           2017-03-03
4           2017-03-04

I have many thousands of rows in the transactions table, and I'm trying to get an annual summary of total and different types of transactions per month (i.e 12 rows in total), where

result:

month     |    transactions     |    additions  |   redemptions
Jan                15                  12               3
Feb                20                  15               5
...

My initial query looks like this:

SELECT  COALESCE(tr.transactions, 0) AS transactions, 
        COALESCE(ad.additions, 0) AS additions, 
        COALESCE(re.redemptions, 0) AS redemptions, 
        calendar.date 
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date FROM calendar WHERE datefield LIKE '2017-%' GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar 
LEFT JOIN (SELECT COUNT(transaction_type) as transactions, from_unixtime(date_add, '%b %Y') as date_t FROM transactions WHERE merchant_id = 108  GROUP BY from_unixtime(date_add, '%b %Y')) AS tr
ON calendar.date = tr.date_t
LEFT JOIN (SELECT COUNT(transaction_type = 'add') as additions, from_unixtime(date_add, '%b %Y') as date_a FROM transactions WHERE merchant_id = 108  AND transaction_type = 'add' GROUP BY from_unixtime(date_add, '%b %Y')) AS ad
ON calendar.date = ad.date_a
LEFT JOIN (SELECT COUNT(transaction_type = 'remove') as redemptions, from_unixtime(date_add, '%b %Y') as date_r FROM transactions WHERE merchant_id = 108  AND transaction_type = 'remove' GROUP BY from_unixtime(date_add, '%b %Y')) AS re
ON calendar.date = re.date_r

I tried optimising and cleaning it up a little, removing the nested statements and came up with this:

SELECT 
    DATE_FORMAT(cal.datefield, '%b %d') as date,
    IFNULL(count(ct.amount),0) as transactions, 
    IFNULL(count(a.amount),0) as additions, 
    IFNULL(count(r.amount),0) as redeptions
FROM calendar as cal 
LEFT JOIN transactions as ct ON cal.datefield = date(from_unixtime(ct.date_add))  && ct.merchant_id = 108
LEFT JOIN transactions as r ON r.id = ct.id && r.transaction_type = 'remove'
LEFT JOIN transactions as a ON a.id = ct.id && a.transaction_type = 'add' 
WHERE cal.datefield like '2017-%'
GROUP BY month(cal.datefield)

I was surprised to see that the revised statement was about 20x slower than the original with my dataset. Have I missed some sort of logic? Is there a better way to achieve the same result with a more streamlined query, given I am joining the same table multiple times?

EDIT: So to further explain the results I'm looking for - I'd like a single row for each month of the year (12 rows) each with a column for the total transactions, total additions, and total redemptions in each month.

The first query I was getting a result in about 0.5 sec but with the second I was getting results in 9.5sec.

Upvotes: 0

Views: 63

Answers (2)

Paul Spiegel
Paul Spiegel

Reputation: 31832

First I would create a derived table with timestamp ranges for every month from your calendar table. This way a join with the transactions table will be efficient if date_add is indexed.

select month(c.datefield) as month, 
       unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
       unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield)

Join it with the transaactions table and use conditional aggregations to get your data:

select c.month,
       sum(t.amount) as transactions,
       sum(case when t.transaction_type = 'add'    then t.amount else 0 end) as additions,
       sum(case when t.transaction_type = 'remove' then t.amount else 0 end) as redemptions
from (
    select month(c.datefield) as m, date_format(c.datefield, '%b') as `month`
           unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
           unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
    from calendar c
    where c.datefield between '2017-01-01' and '2017-12-31'
    group by month(c.datefield), date_format(c.datefield, '%b')
) c
left join transactions t on t.date_add between c.ts_from and c.ts_to
where t.merchant_id = 108
group by c.m, c.month
order by c.m

Upvotes: 0

ScaisEdge
ScaisEdge

Reputation: 133410

Looking to your query You could use a single left join with case when

SELECT  COALESCE(t.transactions, 0) AS transactions, 
        COALESCE(t.additions, 0) AS additions, 
        COALESCE(t.redemptions, 0) AS redemptions, 
        calendar.date 
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date 
          FROM calendar 
          WHERE datefield LIKE '2017-%' 
          GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar 
LEFT JOIN 
 ( select 
      COUNT(transaction_type) as transactions
      , sum( case when transaction_type = 'add' then 1 else 0 end ) as additions
      , sum( case when transaction_type = 'remove' then 1 else 0 end ) as redemptions
      ,  from_unixtime(date_add, '%b %Y') as date_t 
      FROM transactions 
      WHERE merchant_id = 108  
      GROUP BY from_unixtime(date_add, '%b %Y' ) t ON calendar.date = t.date_t

Upvotes: 3

Related Questions