Jimmyt1988
Jimmyt1988

Reputation: 21116

Mysql where in not working as expected

I have the following query:

SELECT 

`tests`.`id`, 
`tests`.`created_at`, 
`tests`.`updated_at`, 
`tests`.`created_by`, 
`tests`.`date_of_test`, 
`tests`.`location`, 
`tests`.`information`, 
`tests`.`title`, 
`tests`.`goals`, 
`tests`.`deleted_at`, 
`tests`.`status`, 
`tests`.`tester`, 
`tests`.`test_approach`

FROM `tests`

WHERE

`tests`.`id` IN (
    SELECT `test_wobble`.`test_id`
    FROM `project_wobble`
    INNER JOIN `wobbles` ON `project_wobble`.`wobble_id` = `wobbles`.`id`
    INNER JOIN `wobble_profiles` ON `wobble_profiles`.`wobble_id` = `wobbles`.`id`
    INNER JOIN `wobble_profile_user` ON `wobble_profile_user`.`wobble_profile_id` = `wobble_profiles`.`id`
    INNER JOIN `test_wobble` ON `test_wobble`.`wobble_id` = `wobbles`.`id`
    WHERE `project_wobble`.`project_id` = '2' AND `wobble_profile_user`.`user_id` = '3'
    GROUP BY `wobbles`.`id`
) 

GROUP BY `tests`.`id`
ORDER BY tests.date_of_test DESC

If I run the IN query on its own, it returns

When i run the whole above query, I get

If I replace the IN query with the number 13... The SQL returns 1 result (The correct one).

What am i doing wrong here?

Upvotes: 2

Views: 677

Answers (3)

axiac
axiac

Reputation: 72177

The inner query exposes undefined behaviour. It is explained in the documentation on the page MySQL Handling of GROUP BY.

According to the SQL standard, the inner query is invalid. To be valid, all the columns that appear in the SELECT, HAVING and ORDER BY clauses must satisfy one of the following:

  • they appear in the GROUP BY clause;
  • are used (in SELECT, HAVING or ORDER BY) only as parameters of aggregate functions;
  • are functionally dependent on the GROUP BY columns.

For example, using your tables, you can put in the SELECT clause:

  • wobbles.id - because it appears in the GROUP BY clause;
  • COUNT(DISTINCT project_wobble.project_id) - even if project_wobble.project_id does not appear in GROUP BY, it can be used as a parameter of the aggregate function COUNT(DISTINCT);
  • any column of table wobbles, given that the column id is its PK - all the columns of table wobbles are functionally dependent on wobbles.id (their values are uniquely determined by the value of wobbles.id).

Before version 5.7.5, MySQL accepts queries that do not follow the above requirements but, as the documentation states:

In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.

Starting with version 5.7.5, MySQL implements detection of functional dependence as an configurable feature (which is turned ON by default).

On 5.7.5 your inner query will trigger an error and that's all; your query is invalid, so it doesn't run at all.

On previous versions (also on 5.7.5 if the ONLY_FULL_GROUP_BY SQL mode is disabled), the query runs but its results are unpredictable. They can change from one execution to the next if, for example, a row is deleted then re-inserted.

Because the MySQL query optimizer re-organizes your whole query for better execution plan, when it is embedded in the larger query its execution is not the same as when it is ran standalone. This is another way you can observe its undefined behaviour.

How to fix your query

Extract the inner query, remove the GROUP BY clause, add more columns to the SELECT clause and look at what it produces:

SELECT DISTINCT `test_wobble`.`wobble_id`, `test_wobble`.`test_id`
FROM `project_wobble`
INNER JOIN `wobbles` ON `project_wobble`.`wobble_id` = `wobbles`.`id`
INNER JOIN `wobble_profiles` ON `wobble_profiles`.`wobble_id` = `wobbles`.`id`
INNER JOIN `wobble_profile_user` ON `wobble_profile_user`.`wobble_profile_id` = `wobble_profiles`.`id`
INNER JOIN `test_wobble` ON `test_wobble`.`wobble_id` = `wobbles`.`id`
WHERE `project_wobble`.`project_id` = '2' AND `wobble_profile_user`.`user_id` = '3'

If I'm not wrong, it will produce two rows having the same wobble_id and values 13 and 14 for column test_id.

If this result set is correct then you can remove test_wobble.wobble_id from SELECT, keep DISTINCT and put the query into the larger one.

There is no need for GROUP BY (because of the DISTINCT) and it should work faster without it.

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425251

This query:

SELECT `test_wobble`.`test_id`
FROM `project_wobble`
INNER JOIN `wobbles` ON `project_wobble`.`wobble_id` = `wobbles`.`id`
INNER JOIN `wobble_profiles` ON `wobble_profiles`.`wobble_id` = `wobbles`.`id`
INNER JOIN `wobble_profile_user` ON `wobble_profile_user`.`wobble_profile_id` = `wobble_profiles`.`id`
INNER JOIN `test_wobble` ON `test_wobble`.`wobble_id` = `wobbles`.`id`
WHERE `project_wobble`.`project_id` = '2' AND `wobble_profile_user`.`user_id` = '3'
GROUP BY `wobbles`.`id`

groups by wobbles.id but returns test_wobble.test_id which is not a part of GROUP BY.

On each iteration, MySQL pushes the IN field into this query:

SELECT `test_wobble`.`test_id`
FROM `project_wobble`
INNER JOIN `wobbles` ON `project_wobble`.`wobble_id` = `wobbles`.`id`
INNER JOIN `wobble_profiles` ON `wobble_profiles`.`wobble_id` = `wobbles`.`id`
INNER JOIN `wobble_profile_user` ON `wobble_profile_user`.`wobble_profile_id` = `wobble_profiles`.`id`
INNER JOIN `test_wobble` ON `test_wobble`.`wobble_id` = `wobbles`.`id`
WHERE `project_wobble`.`project_id` = '2' AND `wobble_profile_user`.`user_id` = '3'

-- This is implicitly added by MySQL when optimizing
AND `test_wobble`.`test_id` = `tests`.`id`
GROUP BY `wobbles`.`id`

and then just checks if some value exists.

If you remove the GROUP BY from your IN query, you'll see that it contains both 13 and 14, but only one of those is returned when you run the query with GROUP BY.

You can also try running the second query, substituting 13 and 14 instead of tests.id and make sure the query returns something in both cases.

This might actually be considered a bug in MySQL. However, since the documentation does not specify which ungrouped and unaggregated expression will be returned from a grouped query, it's better to specify it explicitly, of side effects from the optimizer will kick in like the do in this case.

Could you please provide some sample of your data and outline what are you going to achieve with the query?

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269445

It is a little bit hard to tell without knowing what the data is. But, you do have an issue in the subquery. This is your subquery:

SELECT `test_wobble`.`test_id`
FROM `project_wobble` INNER JOIN
      `wobbles`
      ON `project_wobble`.`wobble_id` = `wobbles`.`id` INNER JOIN
      `wobble_profiles`
      ON `wobble_profiles`.`wobble_id` = `wobbles`.`id` INNER JOIN
      `wobble_profile_user`
      ON `wobble_profile_user`.`wobble_profile_id` = `wobble_profiles`.`id` INNER JOIN
      `test_wobble`
      ON `test_wobble`.`wobble_id` = `wobbles`.`id`
WHERE `project_wobble`.`project_id` = '2' AND `wobble_profile_user`.`user_id` = '3'
GROUP BY `wobbles`.`id`

Note the select and group by. These have different variables:

`test_wobble`.`test_id`
`wobbles`.`id`

I'm not sure which one you really want. But MySQL returns an indeterminate value when you run the query -- and a value that can change from one run to the next. You should fix the select and group by so they match.

Upvotes: 0

Related Questions