Reputation: 385405
Once upon a time, I had a table like this:
CREATE TABLE `Events` (
`EvtId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`AlarmId` INT UNSIGNED,
-- Other fields omitted for brevity
PRIMARY KEY (`EvtId`)
);
AlarmId
was permitted to be NULL
.
Now, because I want to expand from zero-or-one alarm per event to zero-or-more alarms per event, in a software update I'm changing instances of my database to have this instead:
CREATE TABLE `Events` (
`EvtId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
-- Other fields omitted for brevity
PRIMARY KEY (`EvtId`)
);
CREATE TABLE `EventAlarms` (
`EvtId` INT UNSIGNED NOT NULL,
`AlarmId` INT UNSIGNED NOT NULL,
PRIMARY KEY (`EvtId`, `AlarmId`),
CONSTRAINT `fk_evt` FOREIGN KEY (`EvtId`) REFERENCES `Events` (`EvtId`)
ON DELETE CASCADE ON UPDATE CASCADE
);
So far so good.
The data is easy to migrate, too:
INSERT INTO `EventAlarms`
SELECT `EvtId`, `AlarmId` FROM `Events` WHERE `AlarmId` IS NOT NULL;
ALTER TABLE `Events` DROP COLUMN `AlarmId`;
Thing is, my system requires that a downgrade also be possible. I accept that downgrades will sometimes be lossy in terms of data, and that's okay. However, they do need to work where possible, and result in the older database structure while making a best effort to keep as much original data as is reasonably possible.
In this case, that means going from zero-or-more alarms per event, to zero-or-one alarm per event. I could do it like this:
ALTER TABLE `Events` ADD COLUMN `AlarmId` INT UNSIGNED;
UPDATE `Events`
LEFT JOIN `EventAlarms` USING(`EvtId`)
SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`;
DROP TABLE `EventAlarms`;
… which is kind of fine, since I don't really care which one gets kept (it's best-effort, remember). However, as warned, this is not good for replication as the result may be unpredictable:
> SHOW WARNINGS;
Unsafe statement written to the binary log using statement format since
BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-
increment column after selecting from another table are unsafe because the
order in which rows are retrieved determines what (if any) rows will be
written. This order cannot be predicted and may differ on master and the
slave.
Is there a way to somehow "order" or "limit" the join in the update, or shall I just skip this whole enterprise and stop trying to be clever? If the latter, how can I leave the downgraded AlarmId
as NULL
iff there were multiple rows in the new table between which we cannot safely distinguish? I do want to migrate the AlarmId
if there is only one.
As a downgrade is a "one-time" maintenance operation, it doesn't have to be exactly real-time, but speed would be nice. Both tables could potentially have thousands of rows.
(MariaDB 5.5.56 on CentOS 7, but must also work on whatever ships with CentOS 6.)
Upvotes: 1
Views: 57
Reputation: 48207
Use a subquery and user variables to select just one EventAlarms
In your update instead of EventAlarms
use
( SELECT `EvtId`, `AlarmId`
FROM ( SELECT `EvtId`, `AlarmId`,
@rn := if ( @EvtId = `EvtId`
@rn + 1,
if ( @EvtId := `EvtId` , 1, 1)
) as rn
FROM `EventAlarms`
CROSS JOIN ( SELECT @EvtId := 0, @rn := 0) as vars
ORDER BY EvtId, AlarmId
) as t
WHERE rn = 1
) as SingleEventAlarms
Upvotes: 0
Reputation: 385405
First, we can perform a bit of analysis, with a self-join:
SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `N`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON (`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`
The result will look something like this:
EvtId N
--------------
370 1
371 1
372 4
379 1
380 1
382 16
383 1
384 1
Now you can, if you like, drop all the rows representing events that map to more than one alarm (which you suggest as a fallback solution; I think this makes sense, though you could modify the below to leave one of them in place if you really wanted).
Instead of actually DELETE
ing anything, though, it's easier to introduce a new table, populated using the self-joining query shown above:
CREATE TEMPORARY TABLE `_migrate` (
`EvtId` INT UNSIGNED,
`n` INT UNSIGNED,
PRIMARY KEY (`EvtId`),
KEY `idx_n` (`n`)
);
INSERT INTO `_migrate`
SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `n`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON(`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`;
Then your update becomes:
UPDATE `Events`
LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`
WHERE `EventAlarms`.`AlarmId` IS NOT NULL
And, finally, clean up after yourself:
DROP TABLE `_migrate`;
DROP TABLE `EventAlarms`;
MySQL still kicks out the same warning as before, but since know that at most one value will be pulled from the source tables, we can basically just ignore it.
It should even be reasonably efficient, as we can tell from the equivalent EXPLAIN SELECT
:
EXPLAIN SELECT `Events`.`EvtId` FROM `Events`
LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
WHERE `EventAlarms`.`AlarmId` IS NOT NULL
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------
1 SIMPLE _migrate ref PRIMARY,idx_n idx_n 5 const 6 Using index
1 SIMPLE EventAlarms ref PRIMARY,fk_AlarmId PRIMARY 8 db._migrate.EvtId 1 Using where; Using index
1 SIMPLE Events eq_ref PRIMARY PRIMARY 8 db._migrate.EvtId 1 Using where; Using index
Upvotes: 1