Rishav Ghosh
Rishav Ghosh

Reputation: 75

Is there any way to optimize the recursive sql query?

I need to fill missing date values in orders table. DDL:

create table orders(order_date date, order_value int)

insert into orders values('2022-11-01',100),('2022-11-04 ',200),('2022-11-08',300)

Expected output is as:

order_date | order_value
-----------------------
2022-11-01 | 100
2022-11-02 | 100
2022-11-03 | 100
2022-11-04 | 200
2022-11-05 | 200
2022-11-06 | 200
2022-11-07 | 200
2022-11-08 | 300

I have solved the problem in ms sql using recursive query listed below.

with cte as (
  select min(order_date) [min_date], MAX(order_date) [max_date]
  FROM orders
),  cte2 AS(
  SELECT min_date [date]
  FROM cte
  UNION ALL
  SELECT dateadd(day,1,date) [date]
  FROM cte2
  WHERE date < (SELECT max_date FROM cte)
 ), cte3 as(
  select date [order_date], order_value
  FROM cte2 
  LEFT JOIN orders on date = order_date
 )
 SELECT order_date,
        FIRST_VALUE(order_value) IGNORE NULLS
        OVER(ORDER BY order_date desc ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) [order_value]
 FROM cte3

Is there any alternate approach to solve this problem or any way to optimize the recursive query?

Thanks in advance.

Upvotes: 1

Views: 126

Answers (2)

John Cappelletti
John Cappelletti

Reputation: 81990

A calendar would be most useful, but another option is an ad-hoc numbers table

Example

Select Order_Date = coalesce(dateadd(DAY,N-1,Order_Date),order_date)
      ,Order_Value
 From ( Select *,NDays = datediff(DAY,order_date,lead(order_date,1) over (order by order_date)) From Orders ) A
 left Join ( Select Top 1000 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values n1, master..spt_values n2 ) B
   on N<=NDays
 Order by Order_Date

Results

enter image description here

Upvotes: 1

Thom A
Thom A

Reputation: 95830

As mentioned in the comments, as you are on SQL Server 2022 you can make use of GENERATE_SERIES here, which will be much more performant that a recursive Common Table Expression (rCTE). On prior versions, you can use an (inline) tally or a Calendar table.

WITH Dates AS(
    SELECT MIN(order_date) AS MinDate,
           MAX(order_date) AS MaxDate
    FROM dbo.orders),
DateSeries AS(
    SELECT DATEADD(DAY,GS.value,D.MinDate) AS Date
    FROM Dates D
         CROSS APPLY GENERATE_SERIES(0,DATEDIFF(DAY,MinDate,MaxDate),1) GS)
SELECT DS.Date,
       FIRST_VALUE(O.order_value) IGNORE NULLS OVER (ORDER BY DS.Date desc ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS [order_value]
FROM DateSeries DS
     LEFT JOIN dbo.orders O ON DS.Date = O.order_date
ORDER BY DS.Date;

db<>fiddle

Upvotes: 1

Related Questions