MariaT
MariaT

Reputation: 53

SQL Function with multiple CASE WHEN in the WHERE clause - slow performance

UPDATE

I have tried most of the offered solutions and they work only when I pass a specific 'material_id'.

When I run the function on a bigger dataset it runs forever...

I work in Azure Synapse on a serverless SQL pool, so I am not sure if I can create indexes, as this was mentioned few times.

Could the problem really be in the indexing and not in the query itself?

It might be a silly question, but I am not as experienced as most of you so you will have to forgive me!!


I have a SQL function which works just fine when I am running it on a smaller table.

However, when I run the function in bigger queries the performance is so bad that I wouldn't even call it a working function...

From what I have read in some posts here, the fact that I have CASE WHEN statements as a nested query in a WHERE clause is the first problem.

The second, I imagine, is the fact that the query in the function is being execute for each row in the dataset (not sure, it is my understanding).

I am hoping that someone can help me re-write the query so that it works on bigger datasets.

The purpose of the function is to find the most recent material price at the time a material order is created.

CREATE FUNCTION my_db_name.get_price_at_mo_order
    (@Material_Id VARCHAR(20), 
     @MO_Date DATE)
RETURNS table AS  
RETURN 
    (SELECT  
         mp.price, m.product_code 
     FROM 
         my_db_name.price mp 
     LEFT JOIN 
         my_db_name.material m ON mp.material_id = m.material_id 
     WHERE 
         mp.material_id = @Material_Id 
         AND mp.start_date = (SELECT DISTINCT 
                                  CASE 
                                      WHEN (SELECT COUNT(*)
                                            FROM my_db_name.price
                                            WHERE material_id = @Material_Id
                                              AND start_date <= @MO_Date) >= 1 
                                          THEN (SELECT MAX(CAST(start_date AS DATE))
                                                FROM my_db_name.price
                                                WHERE material_id = @Material_Id
                                                  AND start_date <= @MO_Date)
                                      WHEN @MO_Date < (SELECT MIN(CAST(start_date AS DATE))
                                                       FROM my_db_name.price
                                                       WHERE material_id = @Material_Id)
                                          THEN (SELECT MIN(CAST(start_date AS DATE)) 
                                          FROM my_db_name.price
                                          WHERE material_id = @Material_Id)
                                      ELSE  
                                          (SELECT MAX(CAST(start_date   AS DATE)) 
                                           FROM my_db_name.price
                                           WHERE material_id = @Material_Id)
                                  END AS 'test'
                              FROM 
                                  my_db_name.price))

Upvotes: 0

Views: 100

Answers (4)

ValNik
ValNik

Reputation: 5731

Main part of query is calculate best fit start_date for target date @MO_Date.
Best date is

  1. Nearest date before target date - date1
  2. Nearest (least) date after targe - date2
  3. Max date overall - date3. I don't think this date is really necessary.If date2 is null, this date will also be null.

We calculate this dates in one subquery and apply coalesce(date1,date2,date3)=mp.start_date

JOIN with table material is not necessary, since only one row will be selected, it is easier to select this row from table material using a subquery.

I added 2 columns to the output result of the function for clarity.
Remove them as unnecessary.
See example.
Test data:

id material_id price start_date
1 material-100 2 2024-12-20
2 material-100 2.1 2024-12-23
3 material-100 2.3 2025-01-29
4 material-100 2.4 2025-02-28
5 material-100 2.5 2025-03-29
6 material-101 4 2025-01-02
material_id product_code
material-100 material-100_code
material-101 material-101_code
material-102 material-102_code

Function definition

CREATE FUNCTION get_price_at_mo_order(@Material_Id VARCHAR(20), @MO_Date DATE)
RETURNS table AS  
RETURN 
( 
  SELECT mp.price
    ,(select product_code from material m2 
      where m2.material_id=mp.material_id) as product_code 
       -- additional columns for clarity
    ,mp.start_date
    ,@MO_Date as paramDate
  FROM price mp 
  WHERE mp.material_id = @Material_Id 
    and mp.start_date=(   
      SELECT coalesce(
           MAX(case when start_date <= @MO_Date then start_date end) -- max date<=@MO_date near to @MO_date
          ,MIN(CAST(start_date AS DATE))    -- min date (after @MO_date if no rows before)
          ,MAX(CAST(start_date AS DATE))    -- max date at all  - no need real
        ) dt
      FROM price p2 WHERE p2.material_id = mp.material_Id
   )
)

Tests
We have not rows before target date and have after. Take first after.

select * from get_price_at_mo_order('material-100',cast('2024-12-01' as date));
price product_code start_date paramDate
2 material-100_code 2024-12-20 2024-12-01

We have rows before and after. Take last before.

select * from get_price_at_mo_order('material-100',cast('2025-01-01' as date));
price product_code start_date paramDate
2.1 material-100_code 2024-12-23 2025-01-01

We have rows before and after. Take last before.

select * from get_price_at_mo_order('material-100',cast('2025-02-20' as date));
price product_code start_date paramDate
2.3 material-100_code 2025-01-29 2025-02-20

We have rows before and not after. Take last before.

select * from get_price_at_mo_order('material-100',cast('2025-06-01' as date));
price product_code start_date paramDate
2.5 material-100_code 2025-03-29 2025-06-01

Other tests

select * from get_price_at_mo_order('material-101',cast('2025-01-01' as date))
price product_code start_date paramDate
4 material-101_code 2025-01-02 2025-01-01
select * from get_price_at_mo_order('material-101',cast('2025-06-01' as date))
price product_code start_date paramDate
4 material-101_code 2025-01-02 2025-06-01

And last test will return empty result

select * from get_price_at_mo_order('material-102',cast('2025-06-01' as date))
price product_code start_date paramDate

fiddle

Upvotes: 2

Charlieface
Charlieface

Reputation: 72050

It looks like you can simplify your code like this, by using TOP (1) in an apply to grab the first row of the price table before the date. Then a second apply to grab any first row if the previous apply returned nothing.

CREATE OR ALTER FUNCTION my_db_name.get_price_at_mo_order(
  @Material_Id VARCHAR(20),
  @MO_Date DATE
)
RETURNS table
AS RETURN 
( 
    SELECT
      ISNULL(p1.price, p2.price) AS price,
      m.product_code 
    FROM my_db_name.material m
    OUTER APPLY (
        SELECT TOP (1)  -- possibly WITH TIES?
          p.*
        FROM my_db_name.price p
        WHERE p.material_id = @Material_Id
          AND p.start_date <= CAST(@MO_Date AS datetime)
        ORDER BY p.start_date DESC
        -- do you need multiple rows for a single date?
        -- Then use ORDER BY CAST(p.start_date AS DATE) DESC
    ) p1
    OUTER APPLY (
        SELECT TOP (1)
          p.*
        FROM my_db_name.price p
        WHERE p.material_id = @Material_Id
          AND p1.price IS NULL    -- previous APPLY was empty
        ORDER BY p.start_date DESC
        -- do you need multiple rows for a single date?
        -- Then use ORDER BY CAST(p.start_date AS DATE) DESC
    ) p2
    WHERE m.material_id = @Material_Id
      AND ISNULL(p1.price, p2.price) IS NOT NULL
);

I would expect the following indexes to be the best:

  • Clustered index on material (material_id)
  • Non-clustered index on price (material_id, start_date DESC) INCLUDE (price)

Upvotes: 1

xQbert
xQbert

Reputation: 35333

cross apply: Return material price and product code for a material whose pricing date is provided. Ensure to retrun the price record with the date before the provided date and closest to it.

Something like

SELECT mp.price, m.product_code 
FROM my_db_name.material m
CROSS APPLY (SELECT TOP 1 * FROM my_db_name.price mp 
             WHERE mp.material_id = @Material_Id
               and start_Date <= @mo_Date
               and mp.material_id = m.material_id
             ORDER start_date Desc)

Upvotes: 1

Ragnar
Ragnar

Reputation: 227

Maybe you can try instead of running multiple subqueries, you can try function using TOP 1 with ORDER BY to get the most relevant price directly.

Having data sample would greatly help create better query.

CREATE FUNCTION my_db_name.get_price_at_mo_order(@Material_Id VARCHAR(20), @MO_Date DATE)
RETURNS TABLE 
AS  
RETURN 
(
    SELECT TOP 1 
        mp.price, 
        m.product_code 
    FROM my_db_name.price mp
    LEFT JOIN my_db_name.material m
        ON mp.material_id = m.material_id
    WHERE mp.material_id = @Material_Id 
    ORDER BY 
        CASE 
            WHEN mp.start_date <= @MO_Date THEN mp.start_date 
            ELSE NULL 
        END DESC, 
        mp.start_date ASC 
);

Finally to have a normalized and optimized query, you should also have a normalized db. Here are some steps and normalization forms you need to follow: Database Normalization

Upvotes: 1

Related Questions