CaptainBli
CaptainBli

Reputation: 4201

Set-based alternative to loop in SQL Server

I know that there are several posts about how BAD it is to try to loop in SQL Server in a stored procedure. But I haven't quite found what I am trying to do. We are using data connectivity that can be linked internally directly into excel.

I have seen some posts where a few people have said they could convert most loops to a standard query. But for the life of me I am having trouble with this one.

I need all custIDs who have orders right before an event of type 38,40. But only get them if there is no other order between the event and the order in the first query.

So there are 3 parts. I first query for all orders (orders table) based on a time frame into a temporary table.

Select into temp1 odate, custId from orders where odate>'5/1/12'

Then I could use the temp table to inner join on the secondary table to get a customer event (LogEvent table) that may have occurred some time in the past prior to the current order.

Select into temp2 eventdate, temp1.custID from LogEvent inner join temp1 on 
temp1.custID=LogEvent.custID where EventType in (38,40) and temp1.odate>eventdate
order by eventdate desc

The problem here is that the queries I am trying to run will return all rows for each of the customers from the first query where I only want the latest for each customer. So this is where on the client side I would loop to only get one Event instead of all the old ones. But as all the query has to run inside of Excel I can't really loop client side.

The third step then could use the results from the second query to make check if the event occurred between most current order and any previous order. I only want the data where the event precedes the order and no other orders are in between.

Select ordernum, shopcart.custID from shopcart right outer join temp2 on 
shopcart.custID=temp2.custID where shopcart.odate >= temp2.eventdate and
ordernum is null

Is there a way to simplify this and make it set-based to run in SQL Server instead of some kind of loop that I is perform at the client?

Upvotes: 1

Views: 8417

Answers (2)

Justin Pihony
Justin Pihony

Reputation: 67135

If you are using a newer version of sql server, then you can use the ROW_NUMBER function. I will write an example shortly.

;WITH myCTE AS
( 
SELECT
    eventdate, temp1.custID, 
    ROW_NUMBER() OVER (PARTITION BY temp1.custID ORDER BY eventdate desc) AS CustomerRanking 
FROM LogEvent 
JOIN temp1 
    ON temp1.custID=LogEvent.custID 
WHERE EventType IN (38,40) AND temp1.odate>eventdate
)
SELECT * into temp2 from myCTE WHERE CustomerRanking = 1;

This gets you the most recent event for each customer without a loop.

Also, you could use RANK, however that will create duplicates for ties, whereas ROW_NUMBER will guarantee no duplicate numbers for your partition.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271023

THis is a great example of switching to set-based notation.

First, I combined all three of your queries into a single query. In general, having a single query let's the query optimizer do what it does best -- determine execution paths. It also prevents accidental serialization of queries on a multithreaded/multiprocessor machine.

The key is row_number() for ordering the events so the most recent has a value of 1. You'll see this in the final WHERE clause.

select ordernum, shopcart.custID
from (Select eventdate, temp1.custID,
             row_number() over (partition by temp1.CustID order by EventDate desc) as seqnum
      from LogEvent inner join
           (Select odate, custId
            from order
            where odate>'5/1/12'
           ) temp1 
           on temp1.custID=LogEvent.custID
      where EventType in (38,40) and temp1.odate>eventdate order by eventdate desc 
     ) temp2 left outer join
     ShopCart
     on shopcart.custID=temp2.custID
 where seqnum = 1 and shopcart.odate >= temp2.eventdate and ordernum is null

I kept your naming conventions, even though I think "from order" should generate a syntax error. Even if it doesn't it is bad practice to name tables and columns with reserved SQL words.

Upvotes: 2

Related Questions