Reputation: 53
UPDATE
I have tried most of the offered solutions and they work only when I pass a specific 'material_id'.
When I run the function on a bigger dataset it runs forever...
I work in Azure Synapse on a serverless SQL pool, so I am not sure if I can create indexes, as this was mentioned few times.
Could the problem really be in the indexing and not in the query itself?
It might be a silly question, but I am not as experienced as most of you so you will have to forgive me!!
I have a SQL function which works just fine when I am running it on a smaller table.
However, when I run the function in bigger queries the performance is so bad that I wouldn't even call it a working function...
From what I have read in some posts here, the fact that I have CASE WHEN
statements as a nested query in a WHERE
clause is the first problem.
The second, I imagine, is the fact that the query in the function is being execute for each row in the dataset (not sure, it is my understanding).
I am hoping that someone can help me re-write the query so that it works on bigger datasets.
The purpose of the function is to find the most recent material price at the time a material order is created.
CREATE FUNCTION my_db_name.get_price_at_mo_order
(@Material_Id VARCHAR(20),
@MO_Date DATE)
RETURNS table AS
RETURN
(SELECT
mp.price, m.product_code
FROM
my_db_name.price mp
LEFT JOIN
my_db_name.material m ON mp.material_id = m.material_id
WHERE
mp.material_id = @Material_Id
AND mp.start_date = (SELECT DISTINCT
CASE
WHEN (SELECT COUNT(*)
FROM my_db_name.price
WHERE material_id = @Material_Id
AND start_date <= @MO_Date) >= 1
THEN (SELECT MAX(CAST(start_date AS DATE))
FROM my_db_name.price
WHERE material_id = @Material_Id
AND start_date <= @MO_Date)
WHEN @MO_Date < (SELECT MIN(CAST(start_date AS DATE))
FROM my_db_name.price
WHERE material_id = @Material_Id)
THEN (SELECT MIN(CAST(start_date AS DATE))
FROM my_db_name.price
WHERE material_id = @Material_Id)
ELSE
(SELECT MAX(CAST(start_date AS DATE))
FROM my_db_name.price
WHERE material_id = @Material_Id)
END AS 'test'
FROM
my_db_name.price))
Upvotes: 0
Views: 100
Reputation: 5731
Main part of query is calculate best fit start_date
for target
date @MO_Date.
Best date is
We calculate this dates in one subquery and apply coalesce(date1,date2,date3)=mp.start_date
JOIN with table material
is not necessary, since only one row will be selected, it is easier to select this row from table material using a subquery.
I added 2 columns to the output result of the function for clarity.
Remove them as unnecessary.
See example.
Test data:
id | material_id | price | start_date |
---|---|---|---|
1 | material-100 | 2 | 2024-12-20 |
2 | material-100 | 2.1 | 2024-12-23 |
3 | material-100 | 2.3 | 2025-01-29 |
4 | material-100 | 2.4 | 2025-02-28 |
5 | material-100 | 2.5 | 2025-03-29 |
6 | material-101 | 4 | 2025-01-02 |
material_id | product_code |
---|---|
material-100 | material-100_code |
material-101 | material-101_code |
material-102 | material-102_code |
Function definition
CREATE FUNCTION get_price_at_mo_order(@Material_Id VARCHAR(20), @MO_Date DATE)
RETURNS table AS
RETURN
(
SELECT mp.price
,(select product_code from material m2
where m2.material_id=mp.material_id) as product_code
-- additional columns for clarity
,mp.start_date
,@MO_Date as paramDate
FROM price mp
WHERE mp.material_id = @Material_Id
and mp.start_date=(
SELECT coalesce(
MAX(case when start_date <= @MO_Date then start_date end) -- max date<=@MO_date near to @MO_date
,MIN(CAST(start_date AS DATE)) -- min date (after @MO_date if no rows before)
,MAX(CAST(start_date AS DATE)) -- max date at all - no need real
) dt
FROM price p2 WHERE p2.material_id = mp.material_Id
)
)
Tests
We have not rows before target date and have after. Take first after.
select * from get_price_at_mo_order('material-100',cast('2024-12-01' as date));
price | product_code | start_date | paramDate |
---|---|---|---|
2 | material-100_code | 2024-12-20 | 2024-12-01 |
We have rows before and after. Take last before.
select * from get_price_at_mo_order('material-100',cast('2025-01-01' as date));
price | product_code | start_date | paramDate |
---|---|---|---|
2.1 | material-100_code | 2024-12-23 | 2025-01-01 |
We have rows before and after. Take last before.
select * from get_price_at_mo_order('material-100',cast('2025-02-20' as date));
price | product_code | start_date | paramDate |
---|---|---|---|
2.3 | material-100_code | 2025-01-29 | 2025-02-20 |
We have rows before and not after. Take last before.
select * from get_price_at_mo_order('material-100',cast('2025-06-01' as date));
price | product_code | start_date | paramDate |
---|---|---|---|
2.5 | material-100_code | 2025-03-29 | 2025-06-01 |
Other tests
select * from get_price_at_mo_order('material-101',cast('2025-01-01' as date))
price | product_code | start_date | paramDate |
---|---|---|---|
4 | material-101_code | 2025-01-02 | 2025-01-01 |
select * from get_price_at_mo_order('material-101',cast('2025-06-01' as date))
price | product_code | start_date | paramDate |
---|---|---|---|
4 | material-101_code | 2025-01-02 | 2025-06-01 |
And last test will return empty result
select * from get_price_at_mo_order('material-102',cast('2025-06-01' as date))
price | product_code | start_date | paramDate |
---|
Upvotes: 2
Reputation: 72050
It looks like you can simplify your code like this, by using TOP (1)
in an apply to grab the first row of the price
table before the date. Then a second apply to grab any first row if the previous apply returned nothing.
CREATE OR ALTER FUNCTION my_db_name.get_price_at_mo_order(
@Material_Id VARCHAR(20),
@MO_Date DATE
)
RETURNS table
AS RETURN
(
SELECT
ISNULL(p1.price, p2.price) AS price,
m.product_code
FROM my_db_name.material m
OUTER APPLY (
SELECT TOP (1) -- possibly WITH TIES?
p.*
FROM my_db_name.price p
WHERE p.material_id = @Material_Id
AND p.start_date <= CAST(@MO_Date AS datetime)
ORDER BY p.start_date DESC
-- do you need multiple rows for a single date?
-- Then use ORDER BY CAST(p.start_date AS DATE) DESC
) p1
OUTER APPLY (
SELECT TOP (1)
p.*
FROM my_db_name.price p
WHERE p.material_id = @Material_Id
AND p1.price IS NULL -- previous APPLY was empty
ORDER BY p.start_date DESC
-- do you need multiple rows for a single date?
-- Then use ORDER BY CAST(p.start_date AS DATE) DESC
) p2
WHERE m.material_id = @Material_Id
AND ISNULL(p1.price, p2.price) IS NOT NULL
);
I would expect the following indexes to be the best:
material (material_id)
price (material_id, start_date DESC) INCLUDE (price)
Upvotes: 1
Reputation: 35333
cross apply: Return material price and product code for a material whose pricing date is provided. Ensure to retrun the price record with the date before the provided date and closest to it.
Something like
SELECT mp.price, m.product_code
FROM my_db_name.material m
CROSS APPLY (SELECT TOP 1 * FROM my_db_name.price mp
WHERE mp.material_id = @Material_Id
and start_Date <= @mo_Date
and mp.material_id = m.material_id
ORDER start_date Desc)
Upvotes: 1
Reputation: 227
Maybe you can try instead of running multiple subqueries, you can try function using TOP 1 with ORDER BY to get the most relevant price directly.
Having data sample would greatly help create better query.
CREATE FUNCTION my_db_name.get_price_at_mo_order(@Material_Id VARCHAR(20), @MO_Date DATE)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 1
mp.price,
m.product_code
FROM my_db_name.price mp
LEFT JOIN my_db_name.material m
ON mp.material_id = m.material_id
WHERE mp.material_id = @Material_Id
ORDER BY
CASE
WHEN mp.start_date <= @MO_Date THEN mp.start_date
ELSE NULL
END DESC,
mp.start_date ASC
);
Finally to have a normalized and optimized query, you should also have a normalized db. Here are some steps and normalization forms you need to follow: Database Normalization
Upvotes: 1