Chris Barry
Chris Barry

Reputation: 93

MSSQL cast( [varcharColumn] to int) in SELECT gets executed before WHERE clause filters out bad values

Assume the following schema and query:

Please look past the glaring design issues with having values in a varchar column that we expect to be ints.

create table dbo.Parent (
    Id bigint NOT NULL,
    TypeId int NOT NULL
)

create table dbo.Child (
    Id bigint NOT NULL,
    ParentId bigint NOT NULL,
    TypeId int NOT NULL,
    varcharColumn varchar(300) NULL
)

select cast(c.varcharColumn as int)
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13

The break:

We get a cast break due to a value that cannot be converted to an int. In this case: "123-1". The strange thing is that the value being cast gets filtered out of the final result set.

For example, this returns zero results

select c.varcharColumn
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13
    and c.varcharColumn = '123-1'

The query plan ends up looking at the Child table and actually applying the cast function before the where clause.

We were able to fix this by creating a new index on the child table (it was doing a PK scan)

create index [NCIDX_dbo_Child__TypeId] on dbo.Child (
    TypeId
)
include (
    ParentId,
    varcharColumn
)

It now filters on the parent table's where clause first.

Is there any way to fix this without the extra index? And again, please refrain from any suggestions related to fixing our schema. That is definitely the proper fix in this case.

I'm mostly interested in understanding why it applied the cast before it filtered the result set.

Thanks

Edit - Answer:

Many thanks to both Aaron and Gordon. If I ever get more than 15 rep points, I'll come back and up both of your replies.

We ended up needing Gordon's answer since we wanted to use this query in a view. A few folks at the office were wary of using a case statement because they prefer to have more control over ensuring that we have a smaller result set first (Aaron's answer), however it all boils down to looking at the query plan and checking your read counts.

Again, thanks for all of the responses!

Upvotes: 7

Views: 10852

Answers (4)

rakeshyadvanshi
rakeshyadvanshi

Reputation: 293

Technical perspective to this as per me

 create table dbo.Parent (
    Id bigint NOT NULL,
    TypeId int NOT NULL
)
    create table dbo.Child (
        Id bigint NOT NULL,
        ParentId bigint NOT NULL,
        TypeId int NOT NULL,
        varcharColumn varchar(300) NULL
    )

    select cast(c.varcharColumn as int)
    from dbo.Parent p (nolock)
        inner join dbo.Child c (nolock)
            on p.Id = c.ParentId
                and c.TypeId = 2
    where p.TypeId = 13

while selection you are doing inner join that leads to table lookup and filters out the data Now according to me, SQL server is applying " so-called optimization" and making cast operation along with that. and obviously it is applying this cast before filtering that leads to conversion issue.

one more issue whose curiosity leads me here,

SELECT    MAX( Cast(dealer_number AS INT) + 1)
                    FROM   dealer_number dn
                    INNER JOIN dealer d
                    ON dn.dealer_seq = d.dealer_seq 
                    INNER JOIN dealer_type(nolock) dt
                    ON dt.dealer_number_seq = dn.dealer_number_seq
                    INNER JOIN program_dealer_type(nolock) pdt
                     ON pdt.program_dealer_type_seq = dt.program_dealer_type_seq
                    WHERE 
                    Isnumeric(dealer_number) = 1 AND
                    pdt.dealer_type = 'Dealer'
                    AND d.program_seq = 57

got following issue

The conversion of the varchar value '10054239051' overflowed an int column.

I have added Isnumeric for the 100% proof that I didn't get any alphanumeric number.

but as Isnumeric is being while table look for performance it leads to respective error

now just remove Isnumeric and it works fine

SELECT    MAX( Cast(dealer_number AS INT) + 1)
                    FROM   dealer_number dn
                    INNER JOIN dealer d
                    ON dn.dealer_seq = d.dealer_seq 
                    INNER JOIN dealer_type(nolock) dt
                    ON dt.dealer_number_seq = dn.dealer_number_seq
                    INNER JOIN program_dealer_type(nolock) pdt
                    ON pdt.program_dealer_type_seq = dt.program_dealer_type_seq
                    WHERE 
                    pdt.dealer_type = 'Dealer'
                    AND d.program_seq = 57

now I just removed that pre-processing filter and it just solves the problem but to 100% proof that I didn't get any alphanumeric number I had to implement case in the select

SELECT    MAX(case WHEN 
                    ISNUMERIC(DEALER_NUMBER )=1 
                        THEN  Cast(dealer_number AS INT) 
                        ELSE 0 END 
                    + 1) 
                    FROM   dealer_number dn
                    INNER JOIN dealer d
                    ON dn.dealer_seq = d.dealer_seq 
                    INNER JOIN dealer_type(nolock) dt
                    ON dt.dealer_number_seq = dn.dealer_number_seq
                    INNER JOIN program_dealer_type(nolock) pdt
                    ON pdt.program_dealer_type_seq = dt.program_dealer_type_seq
                    WHERE 
                    pdt.dealer_type = 'Dealer'
                    AND d.program_seq = 57

Above will solve both this issue. Anyone had a point on this most welcome. :)

Upvotes: 0

anon
anon

Reputation:

You can't easily control the way SQL Server processes your query. You can figure out some of the why by deep diving into the execution plan, but understanding that is the least of your problems in this specific case I think. You can do a little with join hints, perhaps, but that's hacky to me and the behavior is still not guaranteed (especially as you move to new versions etc). Two workarounds you could try are:

;WITH c AS 
(
  SELECT varcharColumn, ParentID, TypeId
   FROM dbo.Child AS c
   WHERE c.TypeId = 2
   AND ISNUMERIC(varcharColumn) = 1 --*
)
SELECT CONVERT(INT, c.varcharColumn)
FROM dbo.Parent AS p
INNER JOIN c
ON c.ParentId = p.Id
WHERE p.TypeId = 13;

But I have heard of cases where even separating this out into a CTE could lead to the bad plan that led the convert to occur first. So it may be that you need to break it out even further:

SELECT varcharColumn, ParentID, TypeId
INTO #c
   FROM dbo.Child AS c
   WHERE c.TypeId = 2
   AND ISNUMERIC(varcharColumn) = 1; --*

SELECT CONVERT(INT, c.varcharColumn)
  FROM dbo.Parent AS p
  INNER JOIN #c AS c
  ON c.ParentId = p.Id
  WHERE p.TypeId = 13;

(I also talk about the CASE expression solution in this answer.)

If you are on SQL Server 2012, you can simply do this - now it doesn't matter if the convert is attempted before the filter, and you don't have to rely on the wonky ISNUMERIC() function.*

SELECT TRY_CONVERT(INT, c.varcharColumn)
  FROM dbo.Parent AS p
  INNER JOIN dbo.Child AS c
  ON c.ParentId = p.Id
  WHERE c.TypeId = 2
  AND p.TypeId = 13;

* Please note that IsNumeric is not perfect. I wrote this article several years ago to help deal with this.

Upvotes: 7

Gordon Linoff
Gordon Linoff

Reputation: 1269743

First, this is not a "glaring design issue". SQL is a descriptive language of the output, not a procedural language that specifies how prcoessing is being done. There is no guarantee of the order of processing, in general, and this is an advantage. I might say there is a design issue, but it is around the general handling of exceptions in SQL statements.

According to SQL Server documentation (http://msdn.microsoft.com/en-us/library/ms181765.aspx), you can depend on the order of evauation for a CASE statement for scalar expressions. So, the following should work:

select (case when isnumeric(c.varcharColumn) = 1 then cast(c.varcharColumn as int) end)

Or, to get closer to an "int" expression:

select (case when isnumeric(c.varcharColumn) = 1 and c.varcharColumn not like '%.%' and c.varcharColumn not like '%e%'
             then cast(c.varcharColumn as int)
        end)

At least your code is doing an explicit CAST. This situation is much nastier when the casts are implicit (and there are hundreds of columns).

Upvotes: 7

invertedSpear
invertedSpear

Reputation: 11054

You can move the filter to a subquery, which would filter out the bad vals first, then cast everything. This doesn't answer why, but it does get you what you want in what is, in my opinion, the simplest way.

select cast(varcharColumn as int)
FROM(
select c.varcharColumn

from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13
) table1

Upvotes: -1

Related Questions