Mr. Shiny and New 安宇
Mr. Shiny and New 安宇

Reputation: 13908

MySQL Join giving inconsistent results

I'm trying to plan my meals:

select * from 
    (select floor(rand() * 3) + 1 as rand_id, days1.* from (
          select 'Monday' as dy from dual
    union select 'Tuesday' from dual
    union select 'Wednesday' from dual
    union select 'Thursday' from dual
    union select 'Friday' from dual
    ) days1) days
left join 
    (select id as rand_id, meals1.* from  (
          select 1 as id, 'Pizza' as dinner  from dual
    union select 2, 'Hotdogs' from dual
    union select 3, 'Spaghetti' from dual)meals1) meals
on days.rand_id = meals.rand_id;

When I run this query on SQL Fiddle it works fine, but when I try it with my local mysql instance I get total gibberish results: a random number of rows with a random joining:

+---------+-----------+---------+------+---------+
| rand_id | dy        | rand_id | id   | dinner  |
+---------+-----------+---------+------+---------+
|       1 | Wednesday |       2 |    2 | Hotdogs |
|       1 | Monday    |    NULL | NULL | NULL    |
|       3 | Tuesday   |    NULL | NULL | NULL    |
|       3 | Friday    |    NULL | NULL | NULL    |
+---------+-----------+---------+------+---------+

or

+---------+-----------+---------+------+-----------+
| rand_id | dy        | rand_id | id   | dinner    |
+---------+-----------+---------+------+-----------+
|       3 | Wednesday |       1 |    1 | Pizza     |
|       1 | Wednesday |       3 |    3 | Spaghetti |
|       2 | Thursday  |    NULL | NULL | NULL      |
|       3 | Friday    |    NULL | NULL | NULL      |
+---------+-----------+---------+------+-----------+

What I expect to see is 5 rows, each row with a random number between 1 and 3 in rand_id, both days.rand_id and meals.rand_id to be the same. I expect that each time I run the query I get one row for each day of the week with a randomly selected meal. What could be going wrong in my local mysql (but not sqlfiddle's mysql) that is giving me this output?

(Note: the original goal was to randomly link real customer records with fake customer data to generate test data but I've simplified for this example)

Upvotes: 2

Views: 200

Answers (1)

Solarflare
Solarflare

Reputation: 11096

This seems to be a bug related to joining using rand(). It's probably the same bug as describe in Bug #84573 Call to rand() in a [condition] can cause an empty set to be erroneously returned from Jan 2017, although I am not sure if it got the right level of attention, so maybe re-report it. You can use the code below for it.

Reduced code to reproduce the bug in MySQL 5.6, 5.7 and 8.0 (but not in 5.5 and earlier):

create table a (id int primary key);
insert into a values (1), (2);

create table b (id int primary key);
insert into b values (1);

select * from a left join b on rand(0) > 0.5;
+----+------+
| id | id   |
+----+------+
|  2 |    1 |
|  1 | NULL |
+----+------+
2 rows in set (0.00 sec)    

select * from a left join b on rand(1) > 0.5;
+----+------+
| id | id   |
+----+------+
|  1 | NULL |
+----+------+
1 row in set (0.00 sec)

select * from a left join b on rand(14) > 0.5;
Empty set (0.00 sec)

The expected result for all queries would be to always get both (left) rows and randomly null or 1 in the second column.

That reduced code will produce the bug on SQL Fiddle (which uses MySQL 5.6) too. The reason why your query works on SQL Fiddle seems to be that MySQL 5.6 will materialize your subquery (and not merge it), while 5.7 will merge it by default.

So a workaround in your case could be anything that materializes the subquery that uses rand() (although the bug is probably not directly related to materialization). An easy way to set/switch that behaviour is to use a view, so try e.g. in MySQL 5.7 (otherwise you cannot use a subquery in a view):

create algorithm=merge view view_days1
as select floor(rand() * 3) + 1 as rand_id, days1.* from (
          select 'Monday' as dy from dual
    union select 'Tuesday' from dual
    union select 'Wednesday' from dual
    union select 'Thursday' from dual
    union select 'Friday' from dual
    ) days1;

select * from view_days1 days
left join 
    (select id as rand_id, meals1.* from  (
          select 1 as id, 'Pizza' as dinner  from dual
    union select 2, 'Hotdogs' from dual
    union select 3, 'Spaghetti' from dual)meals1) meals
on days.rand_id = meals.rand_id;

will show the same wrong behaviour, while using

create algorithm=temptable view view_days1 as ...

should work just fine.

Upvotes: 1

Related Questions