Most performant way to filter on multiple values in multiple columns?

Question

I have an application where the user can retrieve a list. The user is allowed to add certain filters. For example:

Articles: 123, 456, 789
CustomerGroups: 1, 2, 3, 4, 5
Customers: null
ArticleGroups: null
...

When a filter is empty (or null), the query must ignore that filter.

What is the most performant way to build your query so it can handle a lot (10+) of different filters (and joins)?

My current approach is the following, but it doesn't scale very well:

CREATE PROCEDURE [dbo].[GetFilteredList] 
    @start datetime,
    @stop datetime,
    @ArticleList varchar(max), -- '123,456,789'
    @ArticleGroupList varchar(max),
    @CustomerList varchar(max),
    @CustomerGroupList varchar(max) -- '1,2,3,4,5'
    --More filters here...
AS 
    BEGIN
    SET NOCOUNT ON

    DECLARE @Articles TABLE (value VARCHAR(10)); 
    INSERT INTO @Articles (value) 
    SELECT * 
    FROM   [dko_db].[dbo].fnSplitString(@ArticleList, ','); 

    DECLARE @ArticleGroups TABLE (value VARCHAR(10)); 
    INSERT INTO @ArticleGroups (value) 
    SELECT * 
    FROM   [dko_db].[dbo].fnSplitString(@ArticleGroupList, ','); 

    DECLARE @Customers TABLE (value VARCHAR(10)); 
    INSERT INTO @Customers (value) 
    SELECT * 
    FROM   [dko_db].[dbo].fnSplitString(@CustomerList, ','); 

    DECLARE @CustomerGroups TABLE (value VARCHAR(10)); 
    INSERT INTO @CustomerGroups (value) 
    SELECT * 
    FROM   [dko_db].[dbo].fnSplitString(@CustomerGroupList, ',');

    select * -- Some columns here
    FROM [dbo].[Orders] o
    LEFT OUTER JOIN [dbo].[Article] a on o.ArticleId = a.Id
    LEFT OUTER JOIN [dbo].[ArticleGroup] ag on a.GroupId = ag.Id
    LEFT OUTER JOIN [dbo].[Customer] c on o.CustomerId = o.Id
    LEFT OUTER JOIN [dbo].[CustomerGroup] cg on c.GroupId = cg.Id
    -- More joins here
    WHERE o.OrderDate between @start and @stop and
        (isnull(@ArticleList, '') = '' or       a.ArticleCode   in (select value from @Articles)) and
        (isnull(@ArticleGroupList, '') = '' or  ag.GroupCode    in (select value from @ArticleGroups)) and
        (isnull(@CustomerList, '') = '' or      c.CustomerCode  in (select value from @Customers)) and
        (isnull(@CustomerGroupList, '') = '' or cg.GroupCode    in (select value from @CustomerGroups))
    ORDER BY c.Name, o.OrderDate

END

Alan Burstein · Accepted Answer

There's a lot of "low hanging fruit" performance improvements here.

First, lose ORDER BY c.Name, o.OrderDate that's just needless sorting.

Second, for your "list" variables (e.g. @ArticleList) - if you don't need VARCHAR(MAX) then change the data type(s) to VARCHAR(8000). VARCHAR(MAX) is much slower than VARCHAR(8000). I Never use MAX data types unless I am certain it's required.

Third, you can skip dumping your split values in to Table variables. That's Just needless overhead. You can lose all those declarations and inserts, then change THIS:

... a.ArticleCode   in (select value from @Articles))

TO:

... a.ArticleCode   in (SELECT value FROM   dbo.fnSplitString(@ArticleList, ',')))

Fourth, fnSplitString is not an inline table valued function (e.g. you see BEGIN and END in the DDL) then it will be slow. An inline splitter will be much faster; consider DelimitedSplit8k or DelimitedSplit8K_LEAD.

Last I would add an OPTION (RECOMPILE) as this is a query highly unlikely to benefit from plan caching. A recompile will force the optimizer to evaluate your parameters ahead of time.

Beyond that, when joining a bunch of tables, check the execution plan, see where most of the data is coming from and use that info to index accordingly.

Most performant way to filter on multiple values in multiple columns?

Answers (1)

Related Questions