Fletch
Fletch

Reputation: 367

**Occasional** Arithmetic overflow error converting expression to data type int

I'm running an update script to obfuscate data and am occasionally experiencing the arithmetic overflow error message, as in the title. The table being updated has 260k records and yet the update script will need to be run several times to produce the error. Although it's so rare I can't rely on the code until it's fixed as it's a pain to debug.

Looking at other similar questions, this is often resolved by changing the data type e.g from INT to BIGINT either in the table or in a calculation. However, I can't see where this could be required. I've reduced the script to the below as I've managed to pin point it to the update of one column.

A function is being called by the update and I've included this below. I suspect that, due to the randomness of the error, the use of the NEW_ID function could be causing it but I haven't been able to re-create the error when just running this part of the function multiple times. The NEW_ID function can't be used in functions so it's being called from a view, also included below.

Update script:

UPDATE dbo.Addresses
SET HouseNumber = CASE WHEN LEN(HouseNumber) > 0 
                       THEN dbo.fn_GenerateRandomString (LEN(HouseNumber), 1, 1, 1) 
                       ELSE HouseNumber 
                  END

NEW_ID view and random string function

CREATE VIEW dbo.vw_GetNewID 
AS 
SELECT NEWID() AS New_ID

CREATE FUNCTION dbo.fn_GenerateRandomString (
@stringLength int, 
@upperCaseBit bit, 
@lowerCaseBit bit,
@numberBit bit
)
RETURNS nvarchar(100)
AS
BEGIN
-- Sanitise string length values.
IF ISNULL(@stringLength, -1) < 0
SET @stringLength = 0

-- Generate a random string from the specified character sets.
DECLARE @string nvarchar(100) = ''
SELECT
@string += c2
FROM
(
    SELECT TOP (@stringLength) c2 FROM (
        SELECT c1 FROM
        (
            VALUES ('A'),('B'),('C')
        ) AS T1(c1)
        WHERE @upperCaseBit = 1
        UNION ALL
        SELECT c1 FROM
        (
            VALUES ('a'),('b'),('c')
        ) AS T1(c1)
        WHERE @lowerCaseBit = 1
        SELECT c1 FROM
        (
            VALUES ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')
        ) AS T1(c1)
        WHERE @numberBit = 1
        ) 
    AS T2(c2)
    ORDER BY (SELECT ABS(CHECKSUM(New_ID)) from vw_GetNewID)
) AS T2

RETURN @string
END

Addresses table (for testing):

CREATE TABLE dbo.Addresses(HouseNumber nchar(32) NULL)

INSERT Addresses(HouseNumber)
VALUES ('DSjkmf jkghjsh35hjk h2jkhj3h jhf'),
    ('SDjfksj3548 ksjk'),
    (NULL),
    (''),
    ('2a'),
    ('1234567890'),
    ('An2b')

Note: only 7k of the rows in the addresses table have a value entered i.e. LEN(HouseNumber) > 0.

Upvotes: 1

Views: 1515

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269853

An arithmetic overflow in what is otherwise string-based code is confounding. But there is one thing that could be causing the arithmetic overflow. That is your ORDER BY clause:

ORDER BY (SELECT ABS(CHECKSUM(New_ID)) from vw_GetNewID)

CHECKSUM() returns an integer, whose range is -2,147,483,648 to 2,147,483,647. Note the absolute value of the smallest number is 2,147,483,648, and that is just outside the range. You can verify that SELECT ABS(CAST('-2147483648' as int)) generates the arithmetic overflow error.

You don't need the checksum(). Alas, you do need the view because this logic is in a function and NEWID() is side-effecting. But, you can use:

ORDER BY (SELECT New_ID from vw_GetNewID)

I suspect that the reason you are seeing this every million or so rows rather than every 4 billion rows or so is because the ORDER BY value is being evaluated multiple times for each row as part of the sorting process. Eventually, it is going to hit the lower limit.

EDIT:

If you care about efficiency, it is probably faster to do this using string operations rather than tables. I might suggest this version of the function:

CREATE VIEW vw_rand AS SELECT rand() as rand;
GO
CREATE FUNCTION dbo.fn_GenerateRandomString (
    @stringLength int, 
    @upperCaseBit bit, 
    @lowerCaseBit bit,
    @numberBit bit
)
RETURNS nvarchar(100)
AS
BEGIN
    DECLARE @string NVARCHAR(255) = '';
-- Sanitise string length values.
    IF ISNULL(@stringLength, -1) < 0
        SET @stringLength = 0;
    DECLARE @lets VARCHAR(255) = '';
    IF (@upperCaseBit = 1) SET @lets = @lets + 'ABC';
    IF (@lowerCaseBit = 1) SET @lets = @lets + 'abc';
    IF (@numberBit = 1) SET @lets = @lets + '0123456789';

    DECLARE @len int = len(@lets);

    WHILE @stringLength > 0 BEGIN
        SELECT @string += SUBSTRING(@lets, 1 + CAST(rand * @len as INT), 1)
        FROM vw_rand;
        SET @stringLength = @stringLength - 1;
    END;
    RETURN @string
END;

As a note: rand() is documented as being exclusive of the end of its range, so you don't have to worry about it returning exactly 1.

Also, this version is subtly different from your version because it can pull the same letter more than once (and as a consequence can also handle longer strings). I think this is actually a benefit.

Upvotes: 3

Related Questions