Bwire
Bwire

Reputation: 1201

Mysql INSERT ... ON DUPLICATE KEY Reinsert with new key

I have a query like this

INSERT INTO data(data_uuid, ...) VALUES(uuid_v4(), ...); 

I have read about on duplicate update but what I am looking for instead of updating the existing column is to reinsert the row with an updated uuid. Is there any constructs that can make life barer-able here?

I keep hitting such an error :

ERROR 1062 (23000): Duplicate entry '0f7018da-4df5-4e22-b995-3b3c3e4e85d3' for key 'data_uuid'

To resolve it temporarily I am padding the UUID however I would like to have cleaner solution that still keeps me with the UUID.

The uuid function looks like this :

CREATE FUNCTION uuid_v4() RETURNS char(36) CHARSET latin1
BEGIN
    SET @h1 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h2 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h3 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h6 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h7 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h8 = LPAD(HEX(FLOOR(RAND() * 0xffff)), 4, '0');
    SET @h4 = CONCAT('4', LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'));
    SET @h5 = CONCAT(HEX(FLOOR(RAND() * 4 + 8)),
                LPAD(HEX(FLOOR(RAND() * 0x0fff)), 3, '0'));
    RETURN LOWER(CONCAT(
        @h1, @h2, '-', @h3, '-', @h4, '-', @h5, '-', @h6, @h7, @h8
    ));
END 

Upvotes: 2

Views: 234

Answers (1)

Elias Soares
Elias Soares

Reputation: 10264

As said in comments, if you are getting collisions with UUID with just a million rows (yes, just) that's because your random uuid generation is using some weak random generator.

Since you now posted your uuid_v4() function, and it relies on MySQL rand(), I can explain why your code is failing.

According to mysql docs: http://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html

RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version.

This means that you can't use mysql to generate your uuid, at least not with RAND() function.

You will need to generate the uuid outside mysql of possible. There's a few libraries for it in many languages:

Always check if the library you choose uses a Cryptographically-safe random generator.

UPDATE

It's possible to generate safe UUID V4 on MySQL side using random_bytes() function:

This function returns a binary string of len random bytes generated using the random number generator of the SSL library.

So we can update your function to:

CREATE FUNCTION uuid_v4s()
    RETURNS CHAR(36)
BEGIN
    -- 1th and 2nd block are made of 6 random bytes
    SET @h1 = HEX(RANDOM_BYTES(4));
    SET @h2 = HEX(RANDOM_BYTES(2));

    -- 3th block will start with a 4 indicating the version, remaining is random
    SET @h3 = SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3);

    -- 4th block first nibble can only be 8, 9 A or B, remaining is random
    SET @h4 = CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),
                SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3));

    -- 5th block is made of 6 random bytes
    SET @h5 = HEX(RANDOM_BYTES(6));

    -- Build the complete UUID
    RETURN LOWER(CONCAT(
        @h1, '-', @h2, '-4', @h3, '-', @h4, '-', @h5
    ));
END

This function should be safe enough to use without care about collisions unless you have a very massive row count.

Test

I've created following test scenario: Insert random UUID v4 as primary key for a table until 40.000.000 rows are created. When a collision is found, the row is updated incrementing collisions column:

INSERT INTO test (uuid) VALUES (uuid_v4()) ON DUPLICATE KEY UPDATE collisions=collisions+1;

The sum of collisions after 40 million rows with each function is:

+----------+----------------+
| RAND()   | RANDOM_BYTES() |
+----------+----------------+
|       55 |              0 |
+----------+----------------+

The number collisions in both scenarios tends to increase as number of rows grows.

Upvotes: 2

Related Questions