Reputation: 51
I think I'm losing it! :-/
So, I have this MySQL UPDATE query ...
UPDATE `Jobs`
SET `Archived` = 'Y'
WHERE `JobCode` IN (
SELECT `JobCode` FROM (
SELECT m.`JobCode`
FROM `Jobs` as jx
JOIN (
SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
FROM `Cuts` AS c
LEFT JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
GROUP BY j.`JobCode`
) AS m ON (m.`JobCode`=jx.`JobCode`)
WHERE m.MaxCutDate < '2014-01-01'
) AS tmp_table
)
The tables Customers
, Jobs
and Cuts
are part of a simple relational group. Customers
have multiple Jobs
, Jobs
have multiple Cuts
. I have left this complexity in for my question here, because I am not sure if it is part of the problem or not and didn't want to over simplify the problem away.
The query's intention is to set Jobs
.Archived
to 'Y'
for all Jobs
that have not had a Cut
record created since before 2014-01-01 OK
, according to the Cuts
.CutDate
field. Seemed simple enough at the time.
So, the problem is this; The following section of the query correctly produces a single column result, containing only JobCode
's that have no Cut
records on file with a Cut
.CutDate
since 2014-01-01
...
This works as intended ...
SELECT `JobCode` FROM (
SELECT m.`JobCode`
FROM `Jobs` as jx
JOIN (
SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
FROM `Cuts` AS c
LEFT JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
GROUP BY j.`JobCode`
) AS m ON (m.`JobCode`=jx.`JobCode`)
WHERE m.MaxCutDate < '2014-01-01'
) AS tmp_table
[ NOTE: The double SELECT ( SELECT ...))) is to get around an inane but well known issue with MySQL. I see no problem here. ]
As an example, the above might produce the following ...
JobCode
-------
930
935
936
There happen to also be JobCode
's 931 through 934 in the Jobs
table. These are not included in the above results, because they do not have Cut
records with CutDate
's more recent than 2014-01-01
. This is what I want and what I expected.
HOWEVER ... when I run the full query, including the UPDATE
part ...
UPDATE `Jobs`
SET `Archived` = 'Y'
WHERE `JobCode` IN (
SELECT `JobCode` FROM (
SELECT m.`JobCode`
FROM `Jobs` as jx
JOIN (
SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
FROM `Cuts` AS c
LEFT JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
GROUP BY j.`JobCode`
) AS m ON (m.`JobCode`=jx.`JobCode`)
WHERE m.MaxCutDate < '2014-01-01'
) AS tmp_table
)
... then well, crazy things seem to happen! What I expect is that ONLY Jobs
records that exist in the output from the SELECT
should get updated. But random other Jobs
records do as well! I say, "random", because a) I can find no rhyme or reason for the ones they.
So, that's the problem. Please oh pretty, pretty please ... WTF am I missing here?
Thanks a bunch!
Upvotes: 2
Views: 82
Reputation: 51
Thanks for the great advice everyone. You were ALL correct.
In the end, the "random" results were in fact an error in the surrounding PHP code. Random Jobs were not in fact being archived.
On top of that, I'm using phpMyAdmin
, which apparently has a (another :-P) bug. When I use sub-queries, it will report, "Zero (0) rows affected", when that's simply not true! I was believing this and not checking further on the actual table contents after my query. I wasted a LOT of time like this. :-/
Losing the LEFT in LEFT JOIN definitely helps with the case where a Job has no related Cuts record.
In the end, I went with @Andrew's version, only since I believe it would be the most efficient, due to not using HAVING ... that and it's the way I originally tried, so it feels better hehe ...
<!-- language: SQL -->
UPDATE `Jobs`
JOIN (SELECT `JobCode` FROM (
SELECT m.`JobCode`
FROM `Jobs` as jx
JOIN (
SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
FROM `Cuts` AS c
JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
GROUP BY j.`JobCode`
) AS m ON (m.`JobCode`=jx.`JobCode`)
WHERE m.MaxCutDate < '2014-01-01'
) AS tmp_table
) AS t ON t.`JobCode` = `Jobs`.`JobCode`
SET `Archived` = 'Y'
Thanks again!
Upvotes: 0
Reputation: 1866
Have you tried JOIN
instead of WHERE
condition?
UPDATE `Jobs`
JOIN (SELECT `JobCode` FROM (
SELECT m.`JobCode`
FROM `Jobs` as jx
JOIN (
SELECT j.`JobCode`, MAX(c.`CutDate`) AS MaxCutDate
FROM `Cuts` AS c
JOIN `Jobs` AS j ON (j.`JobCode`=c.`JobCode`)
GROUP BY j.`JobCode`
) AS m ON (m.`JobCode`=jx.`JobCode`)
WHERE m.MaxCutDate < '2014-01-01'
) AS tmp_table
) AS t ON t.`JobCode` = `Jobs`.`JobCode`
SET `Archived` = 'Y'
Also I replaces LEFT JOIN
in subquery with JOIN
. Because we are selecting JobCode
which must exists in table Jobs
- this is necessary for UPDATE
. If JobCode
is null, then there are nothing to update as I understood.
Upvotes: 1
Reputation: 649
I think the order of your left join is incorrect. You want to include all JobCodes from the Jobs table and then if there are records in the Cuts table then you want to include them if they exist.
You also don't handle what happens when a cut record doesn't exist in that case the CutDate will be null.
SELECT j.`JobCode`
FROM `Jobs` as j
LEFT JOIN `Cuts` AS c ON c.`JobCode` = j.`JobCode`
GROUP BY j.`JobCode`
HAVING MAX(IFNULL(c.`CutDate`, '2016-01-01')) < '2014-01-01'
So the date can be set to after 2014 or before depending on if you want to archive the record or not if not Cuts exist for the record. You can use having statements to work on aggregate functions inorder to use the max function and then compare if it is less than 2014.
I'm not sure why you were seeing random results, but I think between the left join being incorrect and maybe not aliasing the JOBS table in the update statement. I'd suggest testing it out before running the update with the following.
# ju will stand for "Jobs Update"
SELECT ju.`JobCode`
FROM `Jobs` ju
WHERE ju.`JobCode` IN (
SELECT j.`JobCode`
FROM `Jobs` as j
LEFT JOIN `Cuts` AS c ON c.`JobCode` = j.`JobCode`
GROUP BY j.`JobCode`
HAVING MAX(IFNULL(c.`CutDate`, '2016-01-01')) < '2014-01-01'
)
If you're ready to run the update then you can switch it over.
# ju will stand for "Jobs Update"
UPDATE `Jobs` ju
SET ju.`Archived` = 'Y'
WHERE ju.`JobCode` IN (
SELECT j.`JobCode`
FROM `Jobs` as j
LEFT JOIN `Cuts` AS c ON c.`JobCode` = j.`JobCode`
GROUP BY j.`JobCode`
HAVING MAX(IFNULL(c.`CutDate`, '2016-01-01')) < '2014-01-01'
)
Upvotes: 1