Reputation: 2845
In order to get the most recent record of a certain combination of identifiers, I use the following query:
SELECT t1.*
FROM (
SELECT id, b_id, c_id
FROM a
ORDER BY epoch DESC
LIMIT 18446744073709551615
) AS t1
GROUP BY t1.b_id, t1.c_id
If there are multiple records of a combination of b_id
+ c_id
, then it will always select the one with the highest value of epoch
(and as such, the latest in time).
The LIMIT
is added as a workaround to force MariaDB to actually order the results. I successfully use this construction a lot in my application, and so have others.
However, now I came across an exact same query in my application, where I "accidentally" used more columns than strictly necessary in the sub-query:
SELECT t1.*
FROM (
SELECT id, b_id, c_id, and, some, other, columns, ...
FROM a
ORDER BY epoch DESC
LIMIT 18446744073709551615
) AS t1
GROUP BY t1.b_id, t1.c_id
I've tested both queries. And the exact same query, but with as only change those additional columns, makes the result to become incorrect. In fact, the number of columns determines the result. If I have <= 28 columns, the result is okay. If I have 29 columns, then it gives the third-latest record (which is wrong too), and if I have 30-36 columns it always gives the second-latest record (36 is the total number for table a
). In my testing, it didn't seem to matter which particular column was removed or added.
I'm having a hard time finding out why exactly the behavior changes after I add more columns. Also, perhaps by chance, it still gave the correct result yesterday. But today suddenly the result changed, probably after new records (with unrelated identifiers) were added to table a
. I've tried using EXPLAIN
:
# The first query, with columns: id, b_id, c_id
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 280 Using where; Using temporary; Using filesort
2 DERIVED a ALL NULL NULL NULL NULL 280 Using filesort
# The second query, with columns: id, b_id, c_id, and, some, other, columns, ...
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 276 Using where; Using temporary; Using filesort
2 DERIVED a ALL NULL NULL NULL NULL 276 Using filesort
But that doesn't really help me much, other than that I can see that the key_len
is different. The second-latest record that is incorrectly received in the second query is one where id = 276
, the actual latest record that it correctly retrieves using the first query is one where id = 278
. In total there are 307 rows now, and yesterday perhaps just ~300. I'm not sure how to interpret these results to understand what is going wrong. Does anyone know? And if not, what else can I do to find out what is causing these strange results?
Upvotes: 0
Views: 424
Reputation: 1269753
This is a malformed query and should be generating a syntax error:
SELECT t1.*
FROM (SELECT id, b_id, c_id
FROM a
ORDER BY epoch DESC
LIMIT 18446744073709551615
) t1
GROUP BY t1.b_id, t1.c_id;
Why? You are selecting 3 columns with no aggregation functions. But the group by
only has two columns. Happily, this is now a syntax error in MySQL, using the default settings. Finally! (MySQL accepted this non-standard syntax until version 8.0.)
You can do what you want using a correlated subquery:
select a.*
from a
where a.epoch = (select max(a2.epoch)
from a a2
where a2.b_id = a.b_id and a2.c_id = a.c_id
);
With an index on a(b_id, c_id, epoch)
, this is probably also faster than aggregation -- even if that happened to work under some circumstances.
Upvotes: 1
Reputation: 222462
Why not use window functions rather than this dirty workaround, that relies on MySQL/MariaDB non-standard behavior regarding group by
?
select *
from (
select a.*, row_number() over(partition by b_id, c_id order by epoch desc) rn
from a
) a
where rn = 1
This works in MySQL 8.0 and Maria DB 10.2 or higher. In earlier versions, one alternative is a correlated subquery:
select *
from a
where epoch = (select max(a1.epoch) from a a1 where a1.b_id = a.b_id and a1.c_id = a.c_id)
Upvotes: 1