Reputation: 1907
I have created a user defined function to gain performance with queries containing 'WHERE col IN (...)' like this case:
SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (100, 200, 300, ..., 4900, 5000);
The queries are generated from an web application and are in some cases much more complex. The function definition looks like this:
CREATE FUNCTION [dbo].[udf_CSVtoIntTable]
(
@CSV VARCHAR(MAX),
@Delimiter CHAR(1) = ','
)
RETURNS
@Result TABLE
(
[Value] INT
)
AS
BEGIN
DECLARE @CurrStartPos SMALLINT;
SET @CurrStartPos = 1;
DECLARE @CurrEndPos SMALLINT;
SET @CurrEndPos = 1;
DECLARE @TotalLength SMALLINT;
-- Remove space, tab, linefeed, carrier return
SET @CSV = REPLACE(@CSV, ' ', '');
SET @CSV = REPLACE(@CSV, CHAR(9), '');
SET @CSV = REPLACE(@CSV, CHAR(10), '');
SET @CSV = REPLACE(@CSV, CHAR(13), '');
-- Add extra delimiter if needed
IF NOT RIGHT(@CSV, 1) = @Delimiter
SET @CSV = @CSV + @Delimiter;
-- Get total string length
SET @TotalLength = LEN(@CSV);
WHILE @CurrStartPos < @TotalLength
BEGIN
SET @CurrEndPos = CHARINDEX(@Delimiter, @CSV, @CurrStartPos);
INSERT INTO @Result
VALUES (CAST(SUBSTRING(@CSV, @CurrStartPos, @CurrEndPos - @CurrStartPos) AS INT));
SET @CurrStartPos = @CurrEndPos + 1;
END
RETURN
END
The function is intended to be used like this (or as an INNER JOIN):
SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (
SELECT [Value]
FROM dbo.udf_CSVtoIntTable('100, 200, 300, ..., 4900, 5000', ',');
Do anyone have some optimiztion idears of my function or other ways to improve performance in my case? Is there any drawbacks that I have missed?
I am using MS SQL Server 2005 Std and .NET 2.0 framework.
Upvotes: 1
Views: 1842
Reputation: 1907
The CLR solution did not give me an good performance so I will use a recursive query. So here is the definition of the SP I will use (mostly based on Erland Sommarskogs examples):
CREATE FUNCTION [dbo].[priudf_CSVtoIntTable]
(
@CSV VARCHAR(MAX),
@Delimiter CHAR(1) = ','
)
RETURNS
@Result TABLE
(
[Value] INT
)
AS
BEGIN
-- Remove space, tab, linefeed, carrier return
SET @CSV = REPLACE(@CSV, ' ', '');
SET @CSV = REPLACE(@CSV, CHAR(9), '');
SET @CSV = REPLACE(@CSV, CHAR(10), '');
SET @CSV = REPLACE(@CSV, CHAR(13), '');
WITH csvtbl(start, stop) AS
(
SELECT start = CONVERT(BIGINT, 1),
stop = CHARINDEX(@Delimiter, @CSV + @Delimiter)
UNION ALL
SELECT start = stop + 1,
stop = CHARINDEX(@Delimiter, @CSV + @Delimiter, stop + 1)
FROM csvtbl
WHERE stop > 0
)
INSERT INTO @Result
SELECT CAST(SUBSTRING(@CSV, start, CASE WHEN stop > 0 THEN stop - start ELSE 0 END) AS INT) AS [Value]
FROM csvtbl
WHERE stop > 0
OPTION (MAXRECURSION 1000)
RETURN
END
Upvotes: 1
Reputation: 1907
Thank for the input, I have to admit that I have made som bad research before I started my work. I found that Erland Sommarskog has written a lot of this problem on his webpage, after your responeses and after reading his page I decided that I will try to make a CLR to solve this.
I tried a recursive query, this resulted in good performance but I will try CLR function anyway.
Upvotes: 0
Reputation: 103587
that loop will kill performance!
create a table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so and use this function:
CREATE FUNCTION [dbo].[FN_ListAllToNumberTable]
(
@SplitOn char(1) --REQUIRED, the character to split the @List string on
,@List varchar(8000) --REQUIRED, the list to split apart
)
RETURNS
@ParsedList table
(
RowNumber int
,ListValue varchar(500)
)
AS
BEGIN
/*
DESCRIPTION: Takes the given @List string and splits it apart based on the given @SplitOn character.
A table is returned, one row per split item, with a columns named "RowNumber" and "ListValue".
This function workes for fixed or variable lenght items.
Empty and null items will be included in the results set.
PARAMETERS:
@List varchar(8000) --REQUIRED, the list to split apart
@SplitOn char(1) --OPTIONAL, the character to split the @List string on, defaults to a comma ","
RETURN VALUES:
a table, one row per item in the list, with a column name "ListValue"
TEST WITH:
----------
SELECT * FROM dbo.FN_ListAllToNumTable(',','1,12,123,1234,54321,6,A,*,|||,,,,B')
DECLARE @InputList varchar(200)
SET @InputList='17;184;75;495'
SELECT
'well formed list',LEFT(@InputList,40) AS InputList,h.Name
FROM Employee h
INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue
WHERE dt.ListValue IS NOT NULL
SET @InputList='17;;;184;75;495;;;'
SELECT
'poorly formed list join',LEFT(@InputList,40) AS InputList,h.Name
FROM Employee h
INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue
SELECT
'poorly formed list',LEFT(@InputList,40) AS InputList, ListValue
FROM dbo.FN_ListAllToNumTable(';',@InputList)
**/
/*this will return empty rows, and row numbers*/
INSERT INTO @ParsedList
(RowNumber,ListValue)
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(@SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT @SplitOn + @List + @SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = @SplitOn
RETURN
END /*Function FN_ListAllToNumTable*/
I have other versions that do not return empty or null rows, ones that return just the item and not the row number, etc. Look in the header comment to see how to use this as part of a JOIN, which is much faster than in a where clause.
Upvotes: 1
Reputation: 16578
Using a UDF in a WHERE clause or (worse) a subquery is asking for trouble. The optimizer sometimes gets it right, but often gets it wrong and evaluates the function once for every row in your query, which you don't want.
If your parameters are static (they appear to be) and you can issue a multistatement batch, I'd load the results of your UDF into a table variable, then use a join against the table variable to do your filtering. This should work more reliably.
Upvotes: 1
Reputation: 3206
I'm not sure of the performance increase, but I would use it as an inner join and get away from the inner select statement.
Upvotes: 1