Sven
Sven

Reputation: 11

MySQL: strange behavior with GROUP BY in WHERE subselect

I hope you can help me with that topic.

I have one table, the relevant fields are VARCHAR id, VARCHAR name and date

3DF0001AB    TESTING_1    2017-04-04
3DF0002ZG    TESTING_2    2017-04-03
3DF0003ER    TESTING_1    2017-04-01
3DF0004XY    TESTING_1    2017-03-26
3DF0005UO    TESTING_3    2017-03-25

The goal is to retrieve two entries for every name (>500), sorted by date. As I can just use database queries I tried following approach. Get one id for every name, UNION the result with the same query, but excluding the ids from the first set.

First step was to get one entry for every name. Result as expected, one id for every name.

SELECT id FROM table GROUP BY name;

Second step; using the above statement in the WHERE clause to receive results, that are not in the first result:

SELECT id FROM table WHERE id NOT IN (SELECT id FROM table GROUP BY name)

But here the result is empty, then I tried to invert the WHERE by using WHERE id IN instead of NOT IN. Expected result was that the same ids would show up when just using the subquery, result was all ids from the table. So I assume that the subquery delivers a wrong result, because when I copy the ids manually -> id IN ("3DF0001AB", ...) it works.

So maybe someone can explain the behavior and/or help to find a solution for the original problem.

Upvotes: 1

Views: 113

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

This is a really bad practice:

SELECT id
FROM table
GROUP BY name;

Although MySQL allows this construct, the returned id is from an indeterminate row. You can even get different rows when you run the same query at different times.

A better approach is to use an aggregation function:

SELECT MAX(id)
FROM table
GROUP BY name;

Your real problem, though, is slightly different. When you use NOT IN, no rows are returned if any value in the IN list is NULL. That is how NOT IN is defined.

I would recommend using NOT EXISTS or LEFT JOIN instead, because their behavior is more intuitive:

SELECT t.id
FROM table t LEFT JOIN
     (SELECT MAX(id) as id
      FROM table t2
      GROUP BY name
     ) tt
     ON t.id = tt.id
WHERE tt.id IS NULL;

Upvotes: 1

Related Questions