Alpha2k
Alpha2k

Reputation: 2241

Mysql query optimization with dates

I'm trying to optimize the following query:

    SELECT a2 AS 'b_actual_pair', 
       a1 AS 'c_actual_date', 
       a3 AS 'd_actual_value', 
       b1 AS 'e_1m_date', 
       b3 AS 'f_1m_value',
       c1 AS 'g_2m_date',
       c3 AS 'h_2m_value',
       d1 AS 'i_3m_date',
       d3 AS 'j_3m_value',
       e1 AS 'k_4m_date',
       e3 AS 'l_4m_value',
       f1 AS 'm_5m_date',
       f3 AS 'n_5m_value'
FROM   (SELECT crd.b_date  AS 'a1', 
               crd.c_pair  AS 'a2',
               crd.d_value AS 'a3' 
        FROM   item_raw_data crd 
        WHERE  crd.a_unique_id > ( (SELECT crd.a_unique_id 
                                    FROM   item_raw_data crd 
                                    ORDER  BY crd.a_unique_id DESC 
                                    LIMIT  0, 1) - ((SELECT 
                                   Count(DISTINCT c_pair) 
                                                     FROM   item_raw_data)) ) 
        ORDER  BY crd.b_date DESC) a, 
       (SELECT crd.b_date  AS 'b1', 
               crd.c_pair  AS 'b2',
               crd.d_value AS 'b3' 
        FROM   item_raw_data crd 
        WHERE  crd.b_date < ( (SELECT crd.b_date 
                               FROM   item_raw_data crd 
                               ORDER  BY crd.b_date DESC 
                               LIMIT  0, 1) - INTERVAL 1 minute ) 
               AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id) 
                                        FROM   (SELECT crd.a_unique_id, 
                                                       crd.b_date, 
                                                       crd.c_pair, 
                                                       crd.d_value 
                                                FROM   item_raw_data crd 
                                                WHERE  crd.b_date < ( 
                                                       (SELECT crd.b_date 
                                                        FROM   item_raw_data 
                                                               crd 
                                                        ORDER  BY crd.b_date 
                                                                  DESC 
                                                        LIMIT  0, 1) - 
                                                       INTERVAL 1 minute ) 
                                                ORDER  BY crd.b_date DESC) x) - 
                                           ((SELECT Count(DISTINCT c_pair) 
                                             FROM   item_raw_data)) ) 
        ORDER  BY crd.b_date DESC) b,
       (SELECT crd.b_date  AS 'c1', 
               crd.c_pair  AS 'c2', 
               crd.d_value AS 'c3' 
        FROM   item_raw_data crd 
        WHERE  crd.b_date < ( (SELECT crd.b_date 
                               FROM   item_raw_data crd 
                               ORDER  BY crd.b_date DESC 
                               LIMIT  0, 1) - INTERVAL 2 minute ) 
               AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id) 
                                        FROM   (SELECT crd.a_unique_id, 
                                                       crd.b_date, 
                                                       crd.c_pair, 
                                                       crd.d_value 
                                                FROM   item_raw_data crd 
                                                WHERE  crd.b_date < ( 
                                                       (SELECT crd.b_date 
                                                        FROM   item_raw_data 
                                                               crd 
                                                        ORDER  BY crd.b_date 
                                                                  DESC 
                                                        LIMIT  0, 1) - 
                                                       INTERVAL 2 minute ) 
                                                ORDER  BY crd.b_date DESC) x) - 
                                           ((SELECT Count(DISTINCT c_pair) 
                                             FROM   item_raw_data)) ) 
        ORDER  BY crd.b_date DESC) c,
       (SELECT crd.b_date  AS 'd1', 
               crd.c_pair  AS 'd2', 
               crd.d_value AS 'd3' 
        FROM   item_raw_data crd 
        WHERE  crd.b_date < ( (SELECT crd.b_date 
                               FROM   item_raw_data crd 
                               ORDER  BY crd.b_date DESC 
                               LIMIT  0, 1) - INTERVAL 3 minute ) 
               AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id) 
                                        FROM   (SELECT crd.a_unique_id, 
                                                       crd.b_date, 
                                                       crd.c_pair, 
                                                       crd.d_value 
                                                FROM   item_raw_data crd 
                                                WHERE  crd.b_date < ( 
                                                       (SELECT crd.b_date 
                                                        FROM   item_raw_data 
                                                               crd 
                                                        ORDER  BY crd.b_date 
                                                                  DESC 
                                                        LIMIT  0, 1) - 
                                                       INTERVAL 3 minute ) 
                                                ORDER  BY crd.b_date DESC) x) - 
                                           ((SELECT Count(DISTINCT c_pair) 
                                             FROM   item_raw_data)) ) 
        ORDER  BY crd.b_date DESC) d,
       (SELECT crd.b_date  AS 'e1', 
               crd.c_pair  AS 'e2', 
               crd.d_value AS 'e3' 
        FROM   item_raw_data crd 
        WHERE  crd.b_date < ( (SELECT crd.b_date 
                               FROM   item_raw_data crd 
                               ORDER  BY crd.b_date DESC 
                               LIMIT  0, 1) - INTERVAL 4 minute ) 
               AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id) 
                                        FROM   (SELECT crd.a_unique_id, 
                                                       crd.b_date, 
                                                       crd.c_pair, 
                                                       crd.d_value 
                                                FROM   item_raw_data crd 
                                                WHERE  crd.b_date < ( 
                                                       (SELECT crd.b_date 
                                                        FROM   item_raw_data 
                                                               crd 
                                                        ORDER  BY crd.b_date 
                                                                  DESC 
                                                        LIMIT  0, 1) - 
                                                       INTERVAL 4 minute ) 
                                                ORDER  BY crd.b_date DESC) x) - 
                                           ((SELECT Count(DISTINCT c_pair) 
                                             FROM   item_raw_data)) ) 
        ORDER  BY crd.b_date DESC) e,
       (SELECT crd.b_date  AS 'f1', 
               crd.c_pair  AS 'f2', 
               crd.d_value AS 'f3' 
        FROM   item_raw_data crd 
        WHERE  crd.b_date < ( (SELECT crd.b_date 
                               FROM   item_raw_data crd 
                               ORDER  BY crd.b_date DESC 
                               LIMIT  0, 1) - INTERVAL 5 minute ) 
               AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id) 
                                        FROM   (SELECT crd.a_unique_id, 
                                                       crd.b_date, 
                                                       crd.c_pair, 
                                                       crd.d_value 
                                                FROM   item_raw_data crd 
                                                WHERE  crd.b_date < ( 
                                                       (SELECT crd.b_date 
                                                        FROM   item_raw_data 
                                                               crd 
                                                        ORDER  BY crd.b_date 
                                                                  DESC 
                                                        LIMIT  0, 1) - 
                                                       INTERVAL 5 minute ) 
                                                ORDER  BY crd.b_date DESC) x) - 
                                           ((SELECT Count(DISTINCT c_pair) 
                                             FROM   item_raw_data)) ) 
        ORDER  BY crd.b_date DESC) f        
WHERE  
    a.a2 = b.b2 
        and 
    b.b2 = c.c2
        and 
    c.c2 = d.d2
        and 
    d.d2 = e.e2
        and
    e.e2 = f.f2

The output of this query is the following:

enter image description here

1. The data behind items_raw_data is the following:

-every 5 seconds 110 items are inserted in the database with their current prices

-actual_pair or c_pair is a reference to a main table which contains the full item description, not of much relevance

-luckily the 110 rows are inserted in 2 seconds, leaving a gap of 3 seconds. This makes it easier to build the queries.

2. The aim of this query is to use this data to do a chart with live prices but we need to further expand the query to (example) 10 minutes, 15 minutes, 1 hour, 2 hours etc... So you will be able to see an items price with its value now and few moments back in time.

3. The problem is that this query takes 2.5 seconds to run with 450.000 (few hours data present and we need up to 1 week of data) total data rows and 6 set of data (actual, 1m, 2m, 3m, 4m, 5m).

What have we tried so far:

  1. Using tables as Memory instead of INNO reduced the query times from 2.5 seconds to 2 seconds. The system has 64gb ECC RAM and a 12 core CPU and an NVMe drive, hardware shouldn't be a problem.

  2. Finding all this data for each item individually yields worse results than all the items together.

  3. Doing exactly the same in server-side language (Java) through threaded code is slower also.

  4. Tried using Inner Join's instead of Where, similar result.

An easier to view query:

SELECT crd.b_date  AS 'a1', 
       crd.c_pair  AS 'a2', 
       crd.d_value AS 'a3' 
FROM   items_raw_data crd 
WHERE  crd.a_unique_id > ( (SELECT crd.a_unique_id 
                            FROM   items_raw_data crd 
                            ORDER  BY crd.a_unique_id DESC 
                            LIMIT  0, 1) - ((SELECT Count(DISTINCT c_pair) 
                                             FROM   items_raw_data)) ) 
ORDER  BY crd.b_date DESC

And the results:

enter image description here

This is for the actual prices only.

Table description:

enter image description here

Update 1

Added explain:

enter image description here

Update 2

Here are the links to download the database, with the query also (own hosting server):

Item raw data SQL: https://cloud.technorah.com/index.php/s/sR3mdK2Oos2EbC3

SQL Query: https://cloud.technorah.com/index.php/s/bdndmLGAUfpduif

Update 3

  1. Using @hunteke query gave a result of 4.7 seconds, which was very strange because the query and the advices seem logical.

  2. Using @hunteke's tips we changed the following:

    SELECT a_unique_id FROM item_raw_data ORDER BY a_unique_id DESC LIMIT  0, 1
    

    to

    SELECT MAX(a_unique_id) FROM item_raw_data
    

this lowered the query time from 2.8 seconds to 2.7 seconds. Adding USE INDEX(primary) on main queries further improved the time from 2.7 seconds to 2.6 seconds.

Update 4

We failed at a basic task, using Timestamp instead of int(11) - primary key in the order by. The latest inserted date is also the latest unique id inserted. So changing from ORDER BY crd.b_date to ORDER BY crd.a_unique_id lowered the query by more than 1 SECOND, from 2.6 to 1.3, almost half.

So the actual query looks like this, completly reworked. The execution time changed from 1.3 seconds to 0.55s

SELECT * 
FROM   
        (SELECT sub.a_unique_id AS 'a0', 
            sub.b_date          AS 'a1', 
            sub.c_pair          AS 'a2', 
            sub.d_value         AS 'a3' 
        FROM   (SELECT * 
                FROM   item_raw_data) sub, 
            (SELECT crd.a_unique_id AS 'max_id', 
                    crd.b_date      AS 'xdate' 
                FROM   item_raw_data crd
                ORDER  BY crd.a_unique_id DESC 
                LIMIT  0, 1) aux 
        WHERE  sub.b_date <= aux.xdate 
            AND sub.a_unique_id > ( aux.max_id - (SELECT 
                                        Count(DISTINCT c_pair) AS 
                                        max_rows 
                                                        FROM   item_raw_data) 
                                        ) 
        ORDER  BY sub.a_unique_id DESC) a,


        (SELECT sub.a_unique_id AS 'b0', 
            sub.b_date          AS 'b1', 
            sub.c_pair          AS 'b2', 
            sub.d_value         AS 'b3' 
        FROM   (SELECT * 
                FROM   item_raw_data) sub, 
            (SELECT crd.a_unique_id AS 'max_id', 
                    crd.b_date      AS 'xdate' 
                FROM   item_raw_data crd
                WHERE  crd.b_date < (SELECT ( crdx.b_date - INTERVAL 1 minute ) 
                                    FROM   item_raw_data crdx 
                                    ORDER  BY crdx.a_unique_id DESC 
                                    LIMIT  0, 1) 
                ORDER  BY crd.a_unique_id DESC 
                LIMIT  0, 1) aux 
        WHERE  sub.b_date <= aux.xdate 
            AND sub.a_unique_id > ( aux.max_id - (SELECT 
                                        Count(DISTINCT c_pair) AS 
                                        max_rows 
                                                        FROM   item_raw_data) 
                                        ) 
        ORDER  BY sub.a_unique_id DESC) b, 


    (SELECT sub.a_unique_id AS 'c0', 
            sub.b_date      AS 'c1', 
            sub.c_pair      AS 'c2', 
            sub.d_value     AS 'c3' 
        FROM   (SELECT *
                FROM   item_raw_data) sub, 
            (SELECT crd.a_unique_id AS 'max_id', 
                    crd.b_date      AS 'xdate' 
                FROM   item_raw_data crd
                WHERE  crd.b_date < (SELECT ( crdx.b_date - INTERVAL 2 minute ) 
                                    FROM   item_raw_data crdx 
                                    ORDER  BY crdx.a_unique_id DESC 
                                    LIMIT  0, 1) 
                ORDER  BY crd.a_unique_id DESC 
                LIMIT  0, 1) aux 
        WHERE  sub.b_date <= aux.xdate 
            AND sub.a_unique_id > ( aux.max_id - (SELECT 
                                        Count(DISTINCT c_pair) AS 
                                        max_rows 
                                                        FROM   item_raw_data) 
                                        ) 
        ORDER  BY sub.a_unique_id DESC) c,


    (SELECT sub.a_unique_id AS 'd0', 
            sub.b_date      AS 'd1', 
            sub.c_pair      AS 'd2', 
            sub.d_value     AS 'd3' 
        FROM   (SELECT *
                FROM   item_raw_data) sub, 
            (SELECT crd.a_unique_id AS 'max_id', 
                    crd.b_date      AS 'xdate' 
                FROM   item_raw_data crd
                WHERE  crd.b_date < (SELECT ( crdx.b_date - INTERVAL 3 minute ) 
                                    FROM   item_raw_data crdx 
                                    ORDER  BY crdx.a_unique_id DESC 
                                    LIMIT  0, 1) 
                ORDER  BY crd.a_unique_id DESC 
                LIMIT  0, 1) aux 
        WHERE  sub.b_date <= aux.xdate 
            AND sub.a_unique_id > ( aux.max_id - (SELECT 
                                        Count(DISTINCT c_pair) AS 
                                        max_rows 
                                                        FROM   item_raw_data) 
                                        ) 
        ORDER  BY sub.a_unique_id DESC) d,      


    (SELECT sub.a_unique_id AS 'e0', 
            sub.b_date      AS 'e1', 
            sub.c_pair      AS 'e2', 
            sub.d_value     AS 'e3' 
        FROM   (SELECT *
                FROM   item_raw_data) sub, 
            (SELECT crd.a_unique_id AS 'max_id', 
                    crd.b_date      AS 'xdate' 
                FROM   item_raw_data crd
                WHERE  crd.b_date < (SELECT ( crdx.b_date - INTERVAL 4 minute ) 
                                    FROM   item_raw_data crdx 
                                    ORDER  BY crdx.a_unique_id DESC 
                                    LIMIT  0, 1) 
                ORDER  BY crd.a_unique_id DESC 
                LIMIT  0, 1) aux 
        WHERE  sub.b_date <= aux.xdate 
            AND sub.a_unique_id > ( aux.max_id - (SELECT 
                                        Count(DISTINCT c_pair) AS 
                                        max_rows 
                                                        FROM   item_raw_data) 
                                        ) 
        ORDER  BY sub.a_unique_id DESC) e,


    (SELECT sub.a_unique_id AS 'f0', 
            sub.b_date      AS 'f1', 
            sub.c_pair      AS 'f2', 
            sub.d_value     AS 'f3' 
        FROM   (SELECT *
                FROM   item_raw_data) sub, 
            (SELECT crd.a_unique_id AS 'max_id', 
                    crd.b_date      AS 'xdate' 
                FROM   item_raw_data crd
                WHERE  crd.b_date < (SELECT ( crdx.b_date - INTERVAL 5 minute ) 
                                    FROM   item_raw_data crdx 
                                    ORDER  BY crdx.a_unique_id DESC 
                                    LIMIT  0, 1) 
                ORDER  BY crd.a_unique_id DESC 
                LIMIT  0, 1) aux 
        WHERE  sub.b_date <= aux.xdate 
            AND sub.a_unique_id > ( aux.max_id - (SELECT 
                                        Count(DISTINCT c_pair) AS 
                                        max_rows 
                                                        FROM   item_raw_data) 
                                        ) 
        ORDER  BY sub.a_unique_id DESC) f

WHERE  
    a.a2 = b.b2
        AND
    b.b2 = c.c2 
        AND 
    c.c2 = d.d2
        AND
    d.d2 = e.e2
        AND
    e.e2 = f.f2

While this is good, lowering from 1.3s almost to 0.55s, we can now use this but we are still looking for further improvements, not only to yield a better result, but an more in-depth understanding of optimization on such big queries and MySQL. Will keep updating the query execution time as table grows.

Upvotes: 1

Views: 85

Answers (2)

hunteke
hunteke

Reputation: 3716

The EXPLAIN output says that the optimizer is choosing to ignore any indexes for 4 of the PRIMARY selections. I can't say with certainty why it's not, but I have a strong suspicion it's to do with the heavyish use of subqueries: MySQL is notoriously bad at optimizing subqueries, and the general advice is "don't use them." Not always tenable.

So the improvements that I can offer now:

  1. When you need the maximum, use MAX. Don't ORDER BY and LIMIT. The ORDER BY requires O(n log n) operations to sort, while the MAX function's worst case requires just O(n) to find the maximum value (scan through the entire table). If you have an index on that column with high-cardinality, then MAX will likely take even less, at O(log n). You can see this slow down via all the filesort's in the EXPLAIN output.

    -- okay
    SELECT a_unique_id FROM item_raw_data ORDER BY a_unique_id DESC LIMIT  0, 1
    
    -- better
    SELECT MAX(a_unique_id) FROM item_raw_data
    
  2. There should be no need to select columns that aren't utilized in subqueries. I would assume the query optimizer would get rid of them, but I have assumed wrong with MySQL before. Especially when it comes to subqueries. So, with a grain of salt:

    -- okay
    SELECT MAX(a_unique_id)
    FROM (SELECT a_unique_id, b_date, c_pair, d_value FROM ...)
    
    -- better
    SELECT MAX(a_unique_id)
    FROM (SELECT a_unique_id FROM ...)
    
  3. The MySQL optimizer is ignoring indexes. You might give MySQL a strong hint to use the proper indexes with USE INDEX(<indexname>). Doing so results in a far simpler query plan that also makes better use of indexes:

    +----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+----------------------------------------------------+
    | id | select_type | table | partitions | type  | possible_keys           | key                     | key_len | ref  | rows   | filtered | Extra                                              |
    +----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+----------------------------------------------------+
    |  1 | PRIMARY     | bb    | NULL       | range | PRIMARY                 | PRIMARY                 | 4       | NULL |   1540 |    33.33 | Using where                                        |
    |  1 | PRIMARY     | aa    | NULL       | range | PRIMARY                 | PRIMARY                 | 4       | NULL |    110 |     0.90 | Using where; Using join buffer (Block Nested Loop) |
    |  1 | PRIMARY     | cc    | NULL       | range | PRIMARY                 | PRIMARY                 | 4       | NULL |   2860 |     0.30 | Using where; Using join buffer (Block Nested Loop) |
    |  1 | PRIMARY     | dd    | NULL       | range | PRIMARY                 | PRIMARY                 | 4       | NULL |   7568 |     0.30 | Using where; Using join buffer (Block Nested Loop) |
    |  1 | PRIMARY     | ee    | NULL       | range | PRIMARY                 | PRIMARY                 | 4       | NULL |  10188 |     0.30 | Using where; Using join buffer (Block Nested Loop) |
    |  1 | PRIMARY     | ff    | NULL       | range | PRIMARY                 | PRIMARY                 | 4       | NULL |  12972 |     0.30 | Using where; Using join buffer (Block Nested Loop) |
    | 21 | SUBQUERY    | ll    | NULL       | ALL   | RAW_DATA_PAIR_UNIQUE_ID | NULL                    | NULL    | NULL | 499198 |    50.00 | Using where                                        |
    | 17 | SUBQUERY    | kk    | NULL       | ALL   | RAW_DATA_PAIR_UNIQUE_ID | NULL                    | NULL    | NULL | 499198 |    50.00 | Using where                                        |
    | 13 | SUBQUERY    | jj    | NULL       | ALL   | RAW_DATA_PAIR_UNIQUE_ID | NULL                    | NULL    | NULL | 499198 |    50.00 | Using where                                        |
    |  9 | SUBQUERY    | ii    | NULL       | ALL   | RAW_DATA_PAIR_UNIQUE_ID | NULL                    | NULL    | NULL | 499198 |    50.00 | Using where                                        |
    |  5 | SUBQUERY    | hh    | NULL       | ALL   | RAW_DATA_PAIR_UNIQUE_ID | NULL                    | NULL    | NULL | 499198 |    50.00 | Using where                                        |
    |  3 | SUBQUERY    | gg    | NULL       | index | RAW_DATA_PAIR_UNIQUE_ID | RAW_DATA_PAIR_UNIQUE_ID | 4       | NULL | 499198 |   100.00 | Using index                                        |
    +----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+----------------------------------------------------+
    
  4. And finally, I've reorganizing the query so it's much more obvious (if you scroll right) how the sections differ (the INTERVALs) and where the USE INDEX's are. The aa, bb, cc (etc.) tables names are there merely to identify where each part maps to the query plan (above):

    SELECT
      a2 AS 'b_actual_pair', a1 AS 'c_actual_date', a3 AS 'd_actual_value',
      b1 AS 'e_1m_date', b3 AS 'f_1m_value',
      c1 AS 'g_2m_date', c3 AS 'h_2m_value',
      d1 AS 'i_3m_date', d3 AS 'j_3m_value',
      e1 AS 'k_4m_date', e3 AS 'l_4m_value',
      f1 AS 'm_5m_date', f3 AS 'n_5m_value'
    FROM
      (SELECT b_date AS a1, c_pair AS a2, d_value AS a3 FROM item_raw_data aa USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data gg USE INDEX(RAW_DATA_PAIR_UNIQUE_ID)) ORDER BY b_date DESC) AS a,
      (SELECT b_date AS b1, c_pair AS b2, d_value AS b3 FROM item_raw_data bb USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data hh WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 1 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 1 minute ORDER BY b_date DESC) AS b,
      (SELECT b_date AS c1, c_pair AS c2, d_value AS c3 FROM item_raw_data cc USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data ii WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 2 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 2 minute ORDER BY b_date DESC) AS c,
      (SELECT b_date AS d1, c_pair AS d2, d_value AS d3 FROM item_raw_data dd USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data jj WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 3 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 3 minute ORDER BY b_date DESC) AS d,
      (SELECT b_date AS e1, c_pair AS e2, d_value AS e3 FROM item_raw_data ee USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data kk WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 4 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 4 minute ORDER BY b_date DESC) AS e,
      (SELECT b_date AS f1, c_pair AS f2, d_value AS f3 FROM item_raw_data ff USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data ll WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 5 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 5 minute ORDER BY b_date DESC) AS f
    WHERE
        a.a2 = b.b2
    AND a.a2 = c.c2
    AND a.a2 = d.d2
    AND a.a2 = e.e2
    AND a.a2 = f.f2
    

And sure enough, the better use of indexes improves the speed. It's not instant, but it takes 50% of the time of the original query on my laptop. Meanwhile, note that if you're trying to meet a 5 second budget for data going back a week, you may still be hosed. A warning that you may need to rethink your approach.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269693

You can definitely simplify the where clause by doing:

WHERE crd.a_unique_id > (SELECT MAX(crd.a_unique_id) - COUNT(DISTINCT c_pair)
                         FROM items_raw_data crd
                        ) 

This might help a bit with performance. You can also move the subquery to the FROM clause to ensure that it is executed only once.

Upvotes: 0

Related Questions