view (or inside query) returns error because of a record that is not part of its result (??)

I have a strange issue with a query. Here is a visual explanation of what happens, following is the textual description and the code to reproduce the problem.

a simple query containing the following inner join:

       TABLE 1                                           TABLE 2
   id        year   month       inner join          id       year    month
'2020.05'    2020   5           (on id)           '2020.05'  2020      5
'2020.05'    2020   5                             '8888.88'  8888      88
'2020.05'    2020   5

This query returns the three records with id '2020.05' example :

SELECT table2.year, table2.month from QueryAbove

returns 3 times 2020,5 :

2020,5
2020,5
2020,5

the problem occurs if I add in the select the function:

SELECT datefromparts(year, month, 1), table2.year, table2.month from QueryAbove

Instead of returning 3 times 2020,5 it will fail because there is the unvalid value 88 for the datefromparts function in the month column of table2....(??) and although this record is not in the result.

This problem occurs to me starting with 3 records in table 1, with 2 or 1 it works fine.

The original query I have that experiences this problem has a lot more data and starts failing with 50 records, so if it works for you try adding more data in table 1 until it fails.

I could reproduce the problem with simple data, please find the code below.

I created a view that returns the result of the inner join query (but the problem still occurs without the view)

This looks like memory related, but I can't explain why and how this is occurring, and I would be very interested to understand this.

If someone knows what is going on, I thank you in advance for the information. please tell me if more information is needed or if something is not clear in my explanation.

------ create tables -----

if OBJECT_ID('dbo.test_view') is not null
begin
drop view dbo.test_view
end

IF OBJECT_ID(N'dbo.temp_fact') IS NOT NULL
BEGIN
DROP TABLE dbo.temp_fact
END
GO

IF OBJECT_ID(N'dbo.temp_dim') IS NOT NULL
BEGIN
DROP TABLE dbo.temp_dim
END
GO
create table temp_fact
(
  submission_bk varchar(30) not null
)

create table temp_dim
(
  submission_bk varchar(30) not null,
  [Submission_Year] int not null,
  [Submission_Month] int not null
)

insert into temp_dim
select '2020.05', 2020, 5 union all
select '8888.88', 8888, 88

insert into temp_fact
select '2020.05' union all 
select '2020.05' union all  
-- select '2020.05' union all 
select '2020.05'

---- create view ----------

create view test_view as (
   select 

   FACT.submission_bk as submission_id,
   s.Submission_Year,
   s.Submission_Month
   from temp_fact as FACT
   inner join temp_dim as s on s.Submission_BK = FACT.Submission_BK 
)
 

------ query the view -------

select 
  [submission_id],
  [Submission_Year],
  [Submission_Month],
  datefromparts(Submission_Year, Submission_Month, 1) as bdate
from test_view

Upvotes: 2

Views: 464

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269883

The logical evaluation of the query has nothing to do with this. The "logical evaluation" really refers to the scoping rules used during the compilation phase. Unfortunately, this part of the documentation has not bee updated in decades to clarify this point.

The SQL query itself is run as a directed acyclic graph (DAG). The compilation and optimization phase constructs the graph from the original query. And there is little similarity between the two. That is, there is no "group by" node in the DAG; no "where" node in that DAG.

One of the consequences is that SQL Server moves operations around. It often finds that running simple expressions before filtering is more efficient than running them after filtering. And this results in errors such as the one you find.

Personally, I think this is a bug. Your code should not generate an error, because it is valid on the data that you choose. Alas, Microsoft does not agree with that interpretation.

Unfortunately, there is no try_datefromparts() function. But you can use another try_ function. I would probably use:

try_convert(date, convert(varchar(100), year*10000 + month*100 + 1))

Upvotes: 2

gotqn
gotqn

Reputation: 43636

The interesting in such cases is that sometimes the statement will not rise an error.

From the docs, the Logical Processing Order of the SELECT statement is as follows:

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

But sometimes, the engine may start performing an operation in the SELECT before other clauses are finished in order to optimize the query. This is also, pointed in the docs:

The preceding sequence is usually true. However, there are uncommon cases where the sequence may differ.

For example, suppose you have a clustered index on a view, and the view excludes some table rows, and the view's SELECT column list uses a CONVERT that changes a data type from varchar to integer. In this situation, the CONVERT may execute before the WHERE clause executes. Uncommon indeed. Often there is a way to modify your view to avoid the different sequence, if it matters in your case.

In your case, the following should solve the issue:

select 
[submission_id]
,[Submission_Year]
,[Submission_Month]
,datefromparts(Submission_Year,Submission_Month,1) as bdate
from test_view
WHERE TRY_CAST(CONCAT(Submission_Year, '-', Submission_Month, '-01') AS DATE) IS NOT NULL

Upvotes: 0

Related Questions