Reyrey88887
Reyrey88887

Reputation: 11

The same SQL query is yielding different results each time it is run. How do I resolve?

I'm having an issue where I am running a query and each time it produces different results. I'm working with relatively large sets of data (700,000+) and the differences are about 50-100.

I'm taking raw data from a staging table on a specific date (so the data is not changing) and then I'm loading that into a temp table.

Each time I check the count of the Temp table, it is the same so the issue is not there. However on the next part, I'm adding some grouping around dates and other specific details. And when I run this, it will have a different population each time it is run.

As an example, this is a very basic version of my script:

IF OBJECT_ID(N'tempdb..#Table1') IS NOT NULL
DROP TABLE [#Table1]
SELECT ORDER ID
,CITY
,DATE
,AMOUNT
,CURRENCY
INTO #TABLE1
FROM #TABLE2 --This is the data from the staging table

--GROUPING
IF OBJECT_ID(N'tempdb..#Table3') IS NOT NULL
DROP TABLE [#Table3]
SELECT CITY
,DATE
,ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1)
,CURRENCY
INTO #TABLE3
FROM #TABLE1
GROUP BY CITY,DATE,CURRENCY
HAVING ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1) = 0

Each time I run this, the record count for #Table3 changes. Any ideas what could be causing this? as mentioned, #Table1 shows the full data set consistently and the underlying source data has not changed.

Upvotes: 0

Views: 3507

Answers (1)

Dai
Dai

Reputation: 155390

I recommend you change your script to use only table-variables. Like so:

  • I assume that you absolutely have to keep on using #table2 for source-data.

    • But please consider changing it to a normal (non-temporary, non-table-variable) table if it contains persisted or expensive-to-produce data.
  • Your query has HAVING ROUND( x, -1 ) = 0 - which I don't think is useful - nor particularly clear.

    • ROUND( x, -1 ) means "round to the nearest tenth-place (including all digits to the right, including digits to the right of the radix place).
      • e.g. ROUND( 12345.67, -1 ) is 120.00.
    • ROUND( x, -1 ) will only evaluate to 0 when -5 < x < 5 (i.e. -4.999999… <= x <= 4.999999…)
    • The same logical condition can be much-simply expressed using just comparison operators as HAVING x > -5 AND x < 5.
      • I assume you were trying to avoid having SUM() mentioned twice, as though you're concerned that HAVING SUM( Amount ) > -5 AND SUM( Amount ) < 5 would make SQL Server compute SUM twice.
        • SQL Server might be kinda dumb, but it isn't that dumb, I can assure you.
        • Another alternative is to use a CTE with the SUM as a named column in the initial SELECT and move the predicate to an outer WHERE clause instead of HAVING. This is the approach used in my answer below (look for WITH g AS).
  • My code uses double-quotes " to escape object-identifiers instead of [] square-brackets, this is because ISO/ANSI SQL uses " so it's the right thing to do, and makes your already-very-proprietary SQL code ever-so-slightly more portable.

  • I wrapped everything in a BEGIN TRY and BEGIN TRANSACTION - even though no normal tables are being mutated, because I assume that eventually you are going to make changes you'll want to persist, so I thought I'd scaffold that for you.



SET XACT_ABORT ON; <-- Everyone should be doing this: https://stackoverflow.com/questions/1150032/what-is-the-benefit-of-using-set-xact-abort-on-in-a-stored-procedure

BEGIN TRY;

    BEGIN TRANSACTION;

    -- As this script depends on the state of #table2, you should explicitly validate its contents before continuing:

    DECLARE @expectedTable2RowCount        int           = 1234;
    DECLARE @expectedTable2CurrencySum     decimal(19,2) = 1234567.89;

    DECLARE @actualTable2RowCount          int           = ( SELECT COUNT(*) FROM #table1 );
    DECLARE @actualTable1Table1CurrencySum decimal(19,2) = ( SELECT SUM( "Amount" ) FROM #table2 );

    IF @expectedTable2RowCount <> @actualTable2RowCount OR @expectedTable2CurrencySum <> @actualTable2Table1CurrencySum
    BEGIN
        THROW 50000, '#table1 contains unexpected data. Aborting script.', 0;
    END;

    ------------------

    DECLARE @myTable2Copy TABLE (
        "Order Id" int           NOT NULL,
        "City"     nvarchar(50)  NOT NULL,
        "Date"     date          NOT NULL,
        "Amount"   decimal(19,2) NOT NULL,
        "Currency" char(3)       NOT NULL,

        PRIMARY KEY ( "Order Id" )
    );

    INSERT INTO @myTable1Copy (
        "Order Id", "City", "Date", "Amount", "Currency"
    )
    SELECT
        "Order Id", "City", "Date", "Amount", "Currency"
    FROM
        #table2;

    ------------------

    DECLARE @myGroupedData TABLE (
        "City"      nvarchar(50)  NOT NULL,
        "Date"      date          NOT NULL,
        "Currency"  char(3)       NOT NULL,

        "SumAmount" decimal(19,2) NOT NULL,

        PRIMARY KEY ( "City", "Date", "Currency" )
    );

    WITH g AS (
        SELECT
            "City",
            "Date",
            "Currency",
            SUM( Amount ) AS SumAmount
        FROM
            @myTable2Copy
        GROUP BY
            "City",
            "Date",
            "Currency"
    )

    INSERT INTO @myGroupedData (
        "City", "Date", "Currency", SumAmount
    )
    SELECT
        "City", "Date", "Currency", SumAmount
    FROM
        g
    WHERE
        g.SumAmount > -5.00
        AND
        g.SumAmount <  5.00;

    ------------------

    COMMIT TRANSACTION;

END TRY;
BEGIN CATCH;

    IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
    THROW;

END CATCH;

Upvotes: 2

Related Questions