Can I control which JOINed row gets used in an UPDATE?

Question

Once upon a time, I had a table like this:

CREATE TABLE `Events` (
  `EvtId`   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `AlarmId` INT UNSIGNED,
  -- Other fields omitted for brevity

  PRIMARY KEY (`EvtId`)
);

AlarmId was permitted to be NULL.

Now, because I want to expand from zero-or-one alarm per event to zero-or-more alarms per event, in a software update I'm changing instances of my database to have this instead:

CREATE TABLE `Events` (
  `EvtId`   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  -- Other fields omitted for brevity

  PRIMARY KEY (`EvtId`)
);

CREATE TABLE `EventAlarms` (
  `EvtId`   INT UNSIGNED NOT NULL,
  `AlarmId` INT UNSIGNED NOT NULL,

  PRIMARY KEY (`EvtId`, `AlarmId`),
  CONSTRAINT `fk_evt` FOREIGN KEY (`EvtId`) REFERENCES `Events` (`EvtId`)
     ON DELETE CASCADE ON UPDATE CASCADE 
);

So far so good.

The data is easy to migrate, too:

INSERT INTO `EventAlarms`
  SELECT `EvtId`, `AlarmId` FROM `Events` WHERE `AlarmId` IS NOT NULL;

ALTER TABLE `Events` DROP COLUMN `AlarmId`;

Thing is, my system requires that a downgrade also be possible. I accept that downgrades will sometimes be lossy in terms of data, and that's okay. However, they do need to work where possible, and result in the older database structure while making a best effort to keep as much original data as is reasonably possible.

In this case, that means going from zero-or-more alarms per event, to zero-or-one alarm per event. I could do it like this:

ALTER TABLE `Events` ADD COLUMN `AlarmId` INT UNSIGNED;

UPDATE `Events`
  LEFT JOIN `EventAlarms` USING(`EvtId`)
  SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`;

DROP TABLE `EventAlarms`;

… which is kind of fine, since I don't really care which one gets kept (it's best-effort, remember). However, as warned, this is not good for replication as the result may be unpredictable:

> SHOW WARNINGS;
Unsafe statement written to the binary log using statement format since
BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-
increment column after selecting from another table are unsafe because the
order in which rows are retrieved determines what (if any) rows will be
written. This order cannot be predicted and may differ on master and the
slave.

Is there a way to somehow "order" or "limit" the join in the update, or shall I just skip this whole enterprise and stop trying to be clever? If the latter, how can I leave the downgraded AlarmId as NULL iff there were multiple rows in the new table between which we cannot safely distinguish? I do want to migrate the AlarmId if there is only one.

As a downgrade is a "one-time" maintenance operation, it doesn't have to be exactly real-time, but speed would be nice. Both tables could potentially have thousands of rows.

(MariaDB 5.5.56 on CentOS 7, but must also work on whatever ships with CentOS 6.)

Lightness Races in Orbit · Accepted Answer

First, we can perform a bit of analysis, with a self-join:

SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `N`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON (`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`

The result will look something like this:

EvtId       N
--------------
370         1
371         1
372         4
379         1
380         1
382        16
383         1
384         1

Now you can, if you like, drop all the rows representing events that map to more than one alarm (which you suggest as a fallback solution; I think this makes sense, though you could modify the below to leave one of them in place if you really wanted).

Instead of actually DELETEing anything, though, it's easier to introduce a new table, populated using the self-joining query shown above:

CREATE TEMPORARY TABLE `_migrate` (
   `EvtId` INT UNSIGNED,
   `n` INT UNSIGNED,

   PRIMARY KEY (`EvtId`),
   KEY `idx_n` (`n`)
);

INSERT INTO `_migrate`
   SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `n`
   FROM `EventAlarms` AS `A`
   LEFT JOIN `EventAlarms` AS `B` ON(`A`.`EvtId` = `B`.`EvtId`)
   GROUP BY `B`.`EvtId`;

Then your update becomes:

UPDATE `Events`
   LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
   LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
   SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`
   WHERE `EventAlarms`.`AlarmId` IS NOT NULL

And, finally, clean up after yourself:

DROP TABLE `_migrate`;
DROP TABLE `EventAlarms`;

MySQL still kicks out the same warning as before, but since know that at most one value will be pulled from the source tables, we can basically just ignore it.

It should even be reasonably efficient, as we can tell from the equivalent EXPLAIN SELECT:

EXPLAIN SELECT `Events`.`EvtId` FROM `Events`
   LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
   LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
   WHERE `EventAlarms`.`AlarmId` IS NOT NULL

id  select_type table       type   possible_keys      key     key_len ref               rows Extra
---------------------------------------------------------------------------------------------------------------------
1   SIMPLE      _migrate    ref    PRIMARY,idx_n      idx_n   5       const             6    Using index
1   SIMPLE      EventAlarms ref    PRIMARY,fk_AlarmId PRIMARY 8       db._migrate.EvtId 1    Using where; Using index
1   SIMPLE      Events      eq_ref PRIMARY            PRIMARY 8       db._migrate.EvtId 1    Using where; Using index

Can I control which JOINed row gets used in an UPDATE?

Answers (2)

Related Questions