Reputation: 11
I'm having an issue where I am running a query and each time it produces different results. I'm working with relatively large sets of data (700,000+) and the differences are about 50-100.
I'm taking raw data from a staging table on a specific date (so the data is not changing) and then I'm loading that into a temp table.
Each time I check the count of the Temp table, it is the same so the issue is not there. However on the next part, I'm adding some grouping around dates and other specific details. And when I run this, it will have a different population each time it is run.
As an example, this is a very basic version of my script:
IF OBJECT_ID(N'tempdb..#Table1') IS NOT NULL
DROP TABLE [#Table1]
SELECT ORDER ID
,CITY
,DATE
,AMOUNT
,CURRENCY
INTO #TABLE1
FROM #TABLE2 --This is the data from the staging table
--GROUPING
IF OBJECT_ID(N'tempdb..#Table3') IS NOT NULL
DROP TABLE [#Table3]
SELECT CITY
,DATE
,ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1)
,CURRENCY
INTO #TABLE3
FROM #TABLE1
GROUP BY CITY,DATE,CURRENCY
HAVING ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1) = 0
Each time I run this, the record count for #Table3 changes. Any ideas what could be causing this? as mentioned, #Table1 shows the full data set consistently and the underlying source data has not changed.
Upvotes: 0
Views: 3507
Reputation: 155390
#TemporaryTables
- just say no.
#TemporaryTables
are a powerful and useful tool, however their lifetime is based on client-connections or procedure scope, which makes me instantly want to avoid them because the lifetime of a client-connection or session is non-deterministic.#tempoaryTables
to be dropped - it's that uncertainty that puts me off.#temporaryTables
' lifetime is bound to your connection rather than any meaningful scope-of-work means that they also make query-batches that use them non-deterministic, which I think is what you're experiencing.float
) to represent monetary values.
0.1
as a IEEE-754 value.SUM
in SQL) becomes non-deterministic. This is why SQL Server restricts how float
values are used in deterministic contexts like indexed-views.
money
or decimal
types when working with money/currency values.
money
should be avoided and decimal(n,2)
used instead because of how money
handles money × money
multiplication or money / money
division, which is fair, but there's no reason to multiply money by money... so this will remain a controversy amongst SQL Server DBAs for a while, methinks.I recommend you change your script to use only table-variables. Like so:
I assume that you absolutely have to keep on using #table2
for source-data.
Your query has HAVING ROUND( x, -1 ) = 0
- which I don't think is useful - nor particularly clear.
ROUND( x, -1 )
means "round to the nearest tenth-place (including all digits to the right, including digits to the right of the radix place).
ROUND( 12345.67, -1 )
is 120.00
.ROUND( x, -1 )
will only evaluate to 0
when -5 < x < 5
(i.e. -4.999999… <= x <= 4.999999…
)HAVING x > -5 AND x < 5
.
SUM()
mentioned twice, as though you're concerned that HAVING SUM( Amount ) > -5 AND SUM( Amount ) < 5
would make SQL Server compute SUM
twice.
SUM
as a named column in the initial SELECT
and move the predicate to an outer WHERE
clause instead of HAVING
. This is the approach used in my answer below (look for WITH g AS
).My code uses double-quotes "
to escape object-identifiers instead of []
square-brackets, this is because ISO/ANSI SQL uses "
so it's the right thing to do, and makes your already-very-proprietary SQL code ever-so-slightly more portable.
I wrapped everything in a BEGIN TRY
and BEGIN TRANSACTION
- even though no normal tables are being mutated, because I assume that eventually you are going to make changes you'll want to persist, so I thought I'd scaffold that for you.
SET XACT_ABORT ON; <-- Everyone should be doing this: https://stackoverflow.com/questions/1150032/what-is-the-benefit-of-using-set-xact-abort-on-in-a-stored-procedure
BEGIN TRY;
BEGIN TRANSACTION;
-- As this script depends on the state of #table2, you should explicitly validate its contents before continuing:
DECLARE @expectedTable2RowCount int = 1234;
DECLARE @expectedTable2CurrencySum decimal(19,2) = 1234567.89;
DECLARE @actualTable2RowCount int = ( SELECT COUNT(*) FROM #table1 );
DECLARE @actualTable1Table1CurrencySum decimal(19,2) = ( SELECT SUM( "Amount" ) FROM #table2 );
IF @expectedTable2RowCount <> @actualTable2RowCount OR @expectedTable2CurrencySum <> @actualTable2Table1CurrencySum
BEGIN
THROW 50000, '#table1 contains unexpected data. Aborting script.', 0;
END;
------------------
DECLARE @myTable2Copy TABLE (
"Order Id" int NOT NULL,
"City" nvarchar(50) NOT NULL,
"Date" date NOT NULL,
"Amount" decimal(19,2) NOT NULL,
"Currency" char(3) NOT NULL,
PRIMARY KEY ( "Order Id" )
);
INSERT INTO @myTable1Copy (
"Order Id", "City", "Date", "Amount", "Currency"
)
SELECT
"Order Id", "City", "Date", "Amount", "Currency"
FROM
#table2;
------------------
DECLARE @myGroupedData TABLE (
"City" nvarchar(50) NOT NULL,
"Date" date NOT NULL,
"Currency" char(3) NOT NULL,
"SumAmount" decimal(19,2) NOT NULL,
PRIMARY KEY ( "City", "Date", "Currency" )
);
WITH g AS (
SELECT
"City",
"Date",
"Currency",
SUM( Amount ) AS SumAmount
FROM
@myTable2Copy
GROUP BY
"City",
"Date",
"Currency"
)
INSERT INTO @myGroupedData (
"City", "Date", "Currency", SumAmount
)
SELECT
"City", "Date", "Currency", SumAmount
FROM
g
WHERE
g.SumAmount > -5.00
AND
g.SumAmount < 5.00;
------------------
COMMIT TRANSACTION;
END TRY;
BEGIN CATCH;
IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
THROW;
END CATCH;
Upvotes: 2