Reputation: 463

How to effectively find first gap (missing number) in a sequence starting with N

I have a database of products with 8 digit codes starting at number 10000000. I'd like to create a function that copies any given products and gives them new codes: new code = old code + 1. The problem is that i never know if the code already exists in the database, so i'm working on a function that tells me what is the next free code. I'm really stuck on creating any efficient SQL query. There are many questions / answers on SO on this topic, but i couldn't really find any which fits my needs:

Is fast (within seconds) as there could be millions of product codes in a row.
May start at arbitrary code N (in queries below i use 10000000, but that's just for testing).
Returns first free code, not next existing code + 1 (even if neither N and N + 1 exist in the database).
Returns only 1 free code.

I tried:

SELECT
    code + 1
FROM
    db_table DBT1
WHERE
    code >= 10000000
    AND NOT EXISTS
    (
        SELECT
            NULL
        FROM
            db_table DBT2 
        WHERE
            DBT2.code = DBT1.code + 1
    )
ORDER BY
    code
LIMIT
    1

Explain returns:

id  select_type             table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY                 DBT1    index   code            code    99          NULL    1       Using where; Using index
2   DEPENDENT SUBQUERY      DBT2    range   code            code    99          NULL    19859   Using where; Using index

Analyze returns:

id  select_type         table   type    possible_keys   key     key_len     ref     rows    r_rows      filtered    r_filtered  Extra
1   PRIMARY             DBT1    index   code            code    99          NULL    39718   35963.00    100.00      0.00        Using where; Using index
2   DEPENDENT SUBQUERY  DBT2    range   code            code    99          NULL    19859   17988.22    100.00      0.01        Using where; Using index

This query is neither fast as it took 227 s to find a free code (20035953) and did not really return 1st free code because codes between 10000000 and 20000000 were free.

I also tried:

SELECT
    DBT1.code + 1
FROM
    db_table AS DBT1
LEFT JOIN
    db_table AS DBT2 ON DBT2.code = DBT1.code + 1
WHERE
    DBT1.code >= 10000000
    AND DBT2.code IS NULL
LIMIT
    1

Explain returns:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          DBT1    index   code            code    99          NULL    39718   Using where; Using index
1   SIMPLE          DBT2    ALL     code            NULL    NULL        NULL    39718   Range checked for each record (index map: 0x2)

I guess this query is even less efficient since for the same request it timeouted.

After more testing i tried both queries on production server (server i tested it on should have even more resources) and they both work really fast (< second). I've no idea what's the problem here. Maybe it has something to do with different version on mysql 5.5.5-10.2.16-MariaDB-10.2.16+maria (test) vs 5.5.5-10.4.15-MariaDB-1:10.4.15+maria (production), maybe some weird configuration i don't really understand - as both tables are exactly the same structure and keys. Anyway i decied to go with the first query as it seems to work slightly faster.

After even more testing i “figured” that the problem has something to do with the fact that column code is VARCHAR(32) (because there is sometimes stuff like 80000001-SALE) which i should have had added in the question in the first place. So i guess the database can't work very well like this. In the end i created and used SQL function i added as an answer which i'm not going to accept because it feels kind of hacky to me.

Upvotes: 2

Answers (3)

Mirous

Reputation: 463

One possible solution is to create and add SQL function which accepts up to 8 digit (int) code and adds 1 until there is no select-match on db_table. It is possibly not the most efficient thing to do, but it works well on both versions of MariaDB and doesn't care that much about type of code column (it can either be int / varchar). Nice bonus is that it returns N even if it doesn't exist and not next existing code + 1.

DROP FUNCTION IF EXISTS nextFreeCode;
DELIMITER $$
CREATE FUNCTION nextFreeCode(code INTEGER(8))
    RETURNS VARCHAR(32)
    DETERMINISTIC
    BEGIN
        DECLARE freeCode INT DEFAULT code;
        DECLARE selectedCode VARCHAR(32) DEFAULT NULL;

        WHILE TRUE DO
        SELECT code INTO selectedCode FROM db_table WHERE code LIKE freeCode LIMIT 1;

        IF selectedCode IS NULL
        THEN
            RETURN freeCode;
        ELSE
            SET selectedCode = NULL;
            SET freeCode = freeCode + 1;
        END IF;
        END WHILE;
    END$$
DELIMITER ;

You can use this function as easy as this:

SELECT nextFreeCode(20001245) AS newCode;

Upvotes: 0

SenorCardgage

Reputation: 194

You can also check mysql's process for executing the query by doing:

EXPLAIN ANALYZE SELECT stmt_here

Which should help you understand why the query is taking so long, and if your index is actually being used. Check how many rows are being scanned and how long each process is taking, that should narrow down the possible causes of the slow query so you will know what parts of the query need to be fixed.

Also, it may have been a while since the last time someone Analyzed the table. After adding or changing indexes or adding lots of rows to your table, you should use ANALYZE TABLE tablename command so that the mysql optimizer can reevaluate the rank of your indexes, so that way it will make sure to use them correctly and use the best one for each query.

Keep in mind, if most of your records have to be scanned for the query to work, then mysql will choose not to use indexes for a good reason, because it is faster to do sequential reads than to do disk seeks with indexes if it has to scan nearly all the records anyway. So if you know your query has to scan through most or all of the records, then you shouldn't try to force mysql to use the index. However, if you know this is not the case, and mysql isn't using your indexes no matter what you do, then you can always force it to use index with force index command.

Upvotes: 2

Bill Karwin

Reputation: 562651

Really, if you needed to make returning the first free code faster you could do this:

SELECT code FROM free_codes ORDER BY code LIMIT 1;

Yes, this requires that you have pre-populated a table free_codes. When you want to use the next free code, remove it from this table, so the next time you check, the same query returns a different code (the next free code).

That may seem like a lot of work, but it illustrates a principle that comes up repeatedly in software design:

If the work you need to do takes too long, do part of it in advance and save it.

It's basically the reason why caching works.

Upvotes: 2

How to effectively find first gap (missing number) in a sequence starting with N

Answers (3)

Related Questions