gruvin
gruvin

Reputation: 51

Why this MySQL UPDATE with sub-query is acting weird?

I think I'm losing it! :-/

So, I have this MySQL UPDATE query ...

UPDATE `Jobs`
SET `Archived` = 'Y'
WHERE `JobCode` IN (
    SELECT `JobCode` FROM (
        SELECT m.`JobCode`
        FROM `Jobs` as jx
        JOIN (
            SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
            FROM `Cuts` AS c
            LEFT JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
            GROUP BY j.`JobCode`
        ) AS m ON (m.`JobCode`=jx.`JobCode`)
        WHERE m.MaxCutDate < '2014-01-01'
    ) AS tmp_table
)

The tables Customers, Jobs and Cuts are part of a simple relational group. Customers have multiple Jobs, Jobs have multiple Cuts. I have left this complexity in for my question here, because I am not sure if it is part of the problem or not and didn't want to over simplify the problem away.

The query's intention is to set Jobs.Archived to 'Y' for all Jobs that have not had a Cut record created since before 2014-01-01 OK, according to the Cuts.CutDate field. Seemed simple enough at the time.

So, the problem is this; The following section of the query correctly produces a single column result, containing only JobCode's that have no Cut records on file with a Cut.CutDate since 2014-01-01 ...

This works as intended ...

    SELECT `JobCode` FROM (
        SELECT m.`JobCode`
        FROM `Jobs` as jx
        JOIN (
            SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
            FROM `Cuts` AS c
            LEFT JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
            GROUP BY j.`JobCode`
        ) AS m ON (m.`JobCode`=jx.`JobCode`)
        WHERE m.MaxCutDate < '2014-01-01'
    ) AS tmp_table

[ NOTE: The double SELECT ( SELECT ...))) is to get around an inane but well known issue with MySQL. I see no problem here. ]

As an example, the above might produce the following ...

JobCode
-------
  930
  935
  936

There happen to also be JobCode's 931 through 934 in the Jobs table. These are not included in the above results, because they do not have Cut records with CutDate's more recent than 2014-01-01. This is what I want and what I expected.

HOWEVER ... when I run the full query, including the UPDATE part ...

UPDATE `Jobs`
SET `Archived` = 'Y'
WHERE `JobCode` IN (
    SELECT `JobCode` FROM (
        SELECT m.`JobCode`
        FROM `Jobs` as jx
        JOIN (
            SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
            FROM `Cuts` AS c
            LEFT JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
            GROUP BY j.`JobCode`
        ) AS m ON (m.`JobCode`=jx.`JobCode`)
        WHERE m.MaxCutDate < '2014-01-01'
    ) AS tmp_table
)

... then well, crazy things seem to happen! What I expect is that ONLY Jobs records that exist in the output from the SELECT should get updated. But random other Jobs records do as well! I say, "random", because a) I can find no rhyme or reason for the ones they.

So, that's the problem. Please oh pretty, pretty please ... WTF am I missing here?

Thanks a bunch!

Upvotes: 2

Views: 82

Answers (3)

gruvin
gruvin

Reputation: 51

Thanks for the great advice everyone. You were ALL correct.

In the end, the "random" results were in fact an error in the surrounding PHP code. Random Jobs were not in fact being archived.

On top of that, I'm using phpMyAdmin, which apparently has a (another :-P) bug. When I use sub-queries, it will report, "Zero (0) rows affected", when that's simply not true! I was believing this and not checking further on the actual table contents after my query. I wasted a LOT of time like this. :-/

Losing the LEFT in LEFT JOIN definitely helps with the case where a Job has no related Cuts record.

In the end, I went with @Andrew's version, only since I believe it would be the most efficient, due to not using HAVING ... that and it's the way I originally tried, so it feels better hehe ...

<!-- language: SQL -->
UPDATE `Jobs`
JOIN (SELECT `JobCode` FROM (
        SELECT m.`JobCode`
        FROM `Jobs` as jx
        JOIN (
            SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
            FROM `Cuts` AS c
            JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
            GROUP BY j.`JobCode`
        ) AS m ON (m.`JobCode`=jx.`JobCode`)
        WHERE m.MaxCutDate < '2014-01-01'
    ) AS tmp_table
  ) AS t ON t.`JobCode` = `Jobs`.`JobCode`
SET `Archived` = 'Y'

Thanks again!

Upvotes: 0

Andrew
Andrew

Reputation: 1866

Have you tried JOIN instead of WHERE condition?

UPDATE `Jobs`
JOIN (SELECT `JobCode` FROM (
        SELECT m.`JobCode`
        FROM `Jobs` as jx
        JOIN (
            SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
            FROM `Cuts` AS c
            JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
            GROUP BY j.`JobCode`
        ) AS m ON (m.`JobCode`=jx.`JobCode`)
        WHERE m.MaxCutDate < '2014-01-01'
    ) AS tmp_table
  ) AS t ON t.`JobCode` = `Jobs`.`JobCode`
SET `Archived` = 'Y'

Also I replaces LEFT JOIN in subquery with JOIN. Because we are selecting JobCode which must exists in table Jobs - this is necessary for UPDATE. If JobCode is null, then there are nothing to update as I understood.

Upvotes: 1

Saqib Rokadia
Saqib Rokadia

Reputation: 649

I think the order of your left join is incorrect. You want to include all JobCodes from the Jobs table and then if there are records in the Cuts table then you want to include them if they exist.

You also don't handle what happens when a cut record doesn't exist in that case the CutDate will be null.

    SELECT j.`JobCode`
      FROM `Jobs` as j
 LEFT JOIN `Cuts` AS c ON c.`JobCode` = j.`JobCode`
  GROUP BY j.`JobCode`
    HAVING MAX(IFNULL(c.`CutDate`, '2016-01-01')) < '2014-01-01'

So the date can be set to after 2014 or before depending on if you want to archive the record or not if not Cuts exist for the record. You can use having statements to work on aggregate functions inorder to use the max function and then compare if it is less than 2014.

I'm not sure why you were seeing random results, but I think between the left join being incorrect and maybe not aliasing the JOBS table in the update statement. I'd suggest testing it out before running the update with the following.

# ju will stand for "Jobs Update"
SELECT ju.`JobCode`
  FROM `Jobs` ju
 WHERE ju.`JobCode` IN (
        SELECT j.`JobCode`
          FROM `Jobs` as j
     LEFT JOIN `Cuts` AS c ON c.`JobCode` = j.`JobCode`
      GROUP BY j.`JobCode`
        HAVING MAX(IFNULL(c.`CutDate`, '2016-01-01')) < '2014-01-01'
 )

If you're ready to run the update then you can switch it over.

# ju will stand for "Jobs Update"
UPDATE `Jobs` ju
   SET ju.`Archived` = 'Y'
 WHERE ju.`JobCode` IN (
        SELECT j.`JobCode`
          FROM `Jobs` as j
     LEFT JOIN `Cuts` AS c ON c.`JobCode` = j.`JobCode`
      GROUP BY j.`JobCode`
        HAVING MAX(IFNULL(c.`CutDate`, '2016-01-01')) < '2014-01-01'
 )

Upvotes: 1

Related Questions