Reputation: 29
I have a strange issue with a query. Here is a visual explanation of what happens, following is the textual description and the code to reproduce the problem.
a simple query containing the following inner join:
TABLE 1 TABLE 2
id year month inner join id year month
'2020.05' 2020 5 (on id) '2020.05' 2020 5
'2020.05' 2020 5 '8888.88' 8888 88
'2020.05' 2020 5
This query returns the three records with id '2020.05' example :
SELECT table2.year, table2.month from QueryAbove
returns 3 times 2020,5 :
2020,5
2020,5
2020,5
the problem occurs if I add in the select the function:
SELECT datefromparts(year, month, 1), table2.year, table2.month from QueryAbove
Instead of returning 3 times 2020,5 it will fail because there is the unvalid value 88 for the datefromparts
function in the month column of table2....(??) and although this record is not in the result.
This problem occurs to me starting with 3 records in table 1, with 2 or 1 it works fine.
The original query I have that experiences this problem has a lot more data and starts failing with 50 records, so if it works for you try adding more data in table 1 until it fails.
I could reproduce the problem with simple data, please find the code below.
I created a view that returns the result of the inner join query (but the problem still occurs without the view)
This looks like memory related, but I can't explain why and how this is occurring, and I would be very interested to understand this.
If someone knows what is going on, I thank you in advance for the information. please tell me if more information is needed or if something is not clear in my explanation.
------ create tables -----
if OBJECT_ID('dbo.test_view') is not null
begin
drop view dbo.test_view
end
IF OBJECT_ID(N'dbo.temp_fact') IS NOT NULL
BEGIN
DROP TABLE dbo.temp_fact
END
GO
IF OBJECT_ID(N'dbo.temp_dim') IS NOT NULL
BEGIN
DROP TABLE dbo.temp_dim
END
GO
create table temp_fact
(
submission_bk varchar(30) not null
)
create table temp_dim
(
submission_bk varchar(30) not null,
[Submission_Year] int not null,
[Submission_Month] int not null
)
insert into temp_dim
select '2020.05', 2020, 5 union all
select '8888.88', 8888, 88
insert into temp_fact
select '2020.05' union all
select '2020.05' union all
-- select '2020.05' union all
select '2020.05'
---- create view ----------
create view test_view as (
select
FACT.submission_bk as submission_id,
s.Submission_Year,
s.Submission_Month
from temp_fact as FACT
inner join temp_dim as s on s.Submission_BK = FACT.Submission_BK
)
------ query the view -------
select
[submission_id],
[Submission_Year],
[Submission_Month],
datefromparts(Submission_Year, Submission_Month, 1) as bdate
from test_view
Upvotes: 2
Views: 464
Reputation: 1269883
The logical evaluation of the query has nothing to do with this. The "logical evaluation" really refers to the scoping rules used during the compilation phase. Unfortunately, this part of the documentation has not bee updated in decades to clarify this point.
The SQL query itself is run as a directed acyclic graph (DAG). The compilation and optimization phase constructs the graph from the original query. And there is little similarity between the two. That is, there is no "group by" node in the DAG; no "where" node in that DAG.
One of the consequences is that SQL Server moves operations around. It often finds that running simple expressions before filtering is more efficient than running them after filtering. And this results in errors such as the one you find.
Personally, I think this is a bug. Your code should not generate an error, because it is valid on the data that you choose. Alas, Microsoft does not agree with that interpretation.
Unfortunately, there is no try_datefromparts()
function. But you can use another try_
function. I would probably use:
try_convert(date, convert(varchar(100), year*10000 + month*100 + 1))
Upvotes: 2
Reputation: 43636
The interesting in such cases is that sometimes the statement will not rise an error.
From the docs, the Logical Processing Order of the SELECT statement
is as follows:
But sometimes, the engine may start performing an operation in the SELECT
before other clauses are finished in order to optimize the query. This is also, pointed in the docs:
The preceding sequence is usually true. However, there are uncommon cases where the sequence may differ.
For example, suppose you have a clustered index on a view, and the view excludes some table rows, and the view's SELECT column list uses a CONVERT that changes a data type from varchar to integer. In this situation, the CONVERT may execute before the WHERE clause executes. Uncommon indeed. Often there is a way to modify your view to avoid the different sequence, if it matters in your case.
In your case, the following should solve the issue:
select
[submission_id]
,[Submission_Year]
,[Submission_Month]
,datefromparts(Submission_Year,Submission_Month,1) as bdate
from test_view
WHERE TRY_CAST(CONCAT(Submission_Year, '-', Submission_Month, '-01') AS DATE) IS NOT NULL
Upvotes: 0