user6064424
user6064424

Reputation:

Query for top N per group code works for MySQL but different result for MariaDB

I have a SQL query which extracts the latest 3 records of each group. The query result for MySQL is different from MariaDB. This query is implemented in the sqlfiddle below

http://sqlfiddle.com/#!9/c09fe/2

Contents of table

CREATE TABLE tmp
    (`mac_addr` varchar(10), `reader_name` varchar(22), `value` numeric, `time_change` datetime)
;

INSERT INTO tmp
    (`mac_addr`, `reader_name`, `value`, `time_change`)
VALUES
    ('''B99A88''', '''name_8''', 1, '2016-07-07 19:21:48'),
    ('''B99A88''', '''own__detect_1''', 1, '2016-06-21 13:30:00'),
    ('''B99A88''', '''own__temperature_1''', 37.4, '2016-05-04 18:23:03'),
    ('''B99A88''', '''own__temperature_1''', 29.4, '2016-05-04 18:19:33'),
    ('''B99A88''', '''own__temperature_1''', 28.4, '2016-05-04 18:17:32'),
    ('''B99A88''', '''own__temperature_1''', 27.4, '2016-05-04 18:04:08'),
    ('''B99A88''', '''own__temperature_1''', 21.4, '2016-05-04 15:11:42'),
    ('''B99A88''', '''own__detect_1''', 0, '2016-04-20 15:22:23'),
    ('''B99A88''', '''own__detect_1''', 1, '2016-04-15 17:39:52'),
    ('''B99A88''', '''own__detect_1''', 0, '2016-04-15 17:39:46'),
    ('''B99A88''', '''own__detect_1''', 1, '2016-04-11 17:34:00'),
    ('''B99A88''', '''own__detect_1''', 1, '2016-04-11 17:33:00'),
    ('''B99A88''', '''own__detect_1''', 0, '2016-04-11 17:33:00'),
    ('''B99A88''', '''own__temperature_1''', 28.4, '2016-04-10 21:20:20'),
    ('''B99A88''', '''own__temperature_1''', 32.5, '2016-04-10 21:00:00'),
    ('''B99A88''', '''own__temperature_1''', 34.2, '2016-04-10 11:29:00')
;

Query to extract latest 3 records of each group.

SELECT mac_addr, reader_name, value, time_change
FROM (
    SELECT t1.*,
           IF(@rn = reader_name, @rowno := @rowno + 1, @rowno := 1) AS rowno,
           @rn := reader_name
    FROM (
        SELECT *
          FROM tmp
        ORDER BY reader_name, time_change DESC
    ) t1
    CROSS JOIN (SELECT @rn := null, @rowno := 0) t2
) t
WHERE rowno <= 3

The result when using MySQL v5.6 is as below;

mac_addr    reader_name             value   time_change
'B99A88'    'name_8'                 1      July, 07 2016 19:21:48
'B99A88'    'own__detect_1'          1      June, 21 2016 13:30:00
'B99A88'    'own__detect_1'          0      April, 20 2016 15:22:23
'B99A88'    'own__detect_1'          1      April, 15 2016 17:39:52
'B99A88'    'own__temperature_1'    37      May, 04 2016 18:23:03
'B99A88'    'own__temperature_1'    29      May, 04 2016 18:19:33
'B99A88'    'own__temperature_1'    28      May, 04 2016 18:17:32

The MySQL result is what I want. However, I am using MariaDB and the result is different from the MySQL result.

The MariaDB result looks like this;

mac_addr    reader_name             value   time_change
'B99A88'    'name_8'                 1      2016-07-07 19:21:48
'B99A88'    'own__detect_1'          1      2016-06-21 13:30:00
'B99A88'    'own__temperature_1'    37      2016-05-04 18:23:03
'B99A88'    'own__temperature_1'    29      2016-05-04 18:19:33
'B99A88'    'own__temperature_1'    28      2016-05-04 18:17:32
'B99A88'    'own__detect_1'          0      2016-04-20 15:22:23
'B99A88'    'own__detect_1'          1      2016-04-15 17:39:52
'B99A88'    'own__detect_1'          0      2016-04-15 17:39:46
'B99A88'    'own__temperature_1'    28      2016-04-10 21:20:20
'B99A88'    'own__temperature_1'    33      2016-04-10 21:00:00
'B99A88'    'own__temperature_1'    34      2016-04-10 11:29:00

How can the query code be modified such that the query output of MariaDB can be the same as MySQL? Would using window functions in MariaDB be a good idea?

Upvotes: 6

Views: 846

Answers (4)

Solarflare
Solarflare

Reputation: 11106

(My)SQL is not required to keep the order of the result of your subquery. You have to order the resultset in the upper query, but in your case, you can actually get rid of the subquery anyway:

SELECT mac_addr, reader_name, value, time_change
FROM (
    SELECT t1.*,
           IF(@rn = reader_name, @rowno := @rowno + 1, @rowno := 1) AS rowno,
           @rn := reader_name
    FROM tmp t1, (SELECT @rn := null, @rowno := 0) t2
    ORDER BY reader_name, time_change DESC
) t
WHERE rowno <= 3;

Just for completeness: This behaviour is specific to the use of variables, and since the result is actually not defined in sql standard, it might change some day (like the optimzation that is causing you trouble right now), but this will very likely not happen, if at all, until window functions are fully supported, so you can ignore this detail. Same is probably true for other ways that force an order, like adding limit 999999999 inside your inner query as Rick suggested, although I could think of some (not yet implemented) optimization paths that could again result in an unspecified order.

Upvotes: 1

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

The classic way of selecting top N per group using standard SQL language constructs is to use ROW_NUMBER:

SELECT
    T.*
FROM
    (
        SELECT *
            ,ROW_NUMBER() OVER (PARTITION BY reader_name ORDER BY time_change DESC) AS rn
        FROM tmp
    ) AS T
WHERE T.rn <= 3
ORDER BY reader_name, time_change DESC;

This query should work in all DBMS that support ROW_NUMBER. MySQL doesn't support it, so people have to use fragile tricks with variables that are MySQL-specific.

Window functions were first introduced in MariaDB 10.2.0. MariaDB has more freedom in optimising queries and this MySQL trick with variables is not reliable any more.

So, answering your question, yes, using window functions in MariaDB would be a good idea.


Another common method of selecting top N per group is using LATERAL join, which is better than ROW_NUMBER when number of groups is small and number of rows in a table is large and you have an appropriate index and a second table with the list of groups. I have no idea whether MariaDB supports LATERAL joins. Looks like it doesn't.

Upvotes: 1

Rick James
Rick James

Reputation: 142298

The query execution is allowed to ignore the ORDER BY in the FROM ( SELECT ... ). This probably the real reason for the difference you are seeing. (I don't think Gordon's answer is relevant.)

The problem is discussed here (4 years ago): https://mariadb.com/kb/en/mariadb/group-by-trick-has-been-optimized-away/ ; that has one solution, via a setting.

Some other solutions are here: http://mysql.rjweb.org/doc.php/groupwise_max ; they are designed to be efficient.

Yet another possible solution is to add a bogus LIMIT with a large number on the subquery.

Upvotes: 4

Gordon Linoff
Gordon Linoff

Reputation: 1269753

The ORDER BY that you are using has two keys:

    ORDER BY reader_name, time_change DESC

However, these keys do not uniquely identify each row. Hence, ordering of rows where the keys are the same is not guaranteed -- not even between two runs of the query on the same database. The normal solution is to add a unique id column as the last ORDER BY key so each row is uniquely identified.

More generally, in SQL, ORDER BY does not use a stable sort. A stable sort is one that retains the original ordering of keys, when keys are the same. The reason is simple. SQL tables and result sets represent unordered sets. There is no initial ordering to keep.

If you have a primary key column, then the ORDER BY would be:

    ORDER BY reader_name, time_change DESC, pk

The rest of the code does not need to change. You only want the sort to be stable.

Upvotes: 3

Related Questions