Ron R
Ron R

Reputation: 37

Is there a more efficient way to write this SQL query?

I have a table with a couple million records in it which stores product views by SessionGUID (visitor), the table structure is:

ID (PK) SessionGUID (uniqueidentifier) ProductID (int) DateTime (datetime)

The purpose of the query is to return a list of Product ID's that were viewed by the same people who viewed a specific product ID (ie, customers who looked at X also looked at Y)

The query I'm using is as follows:

SELECT
A.ProductID
FROM
VISITOR_SESSIONS_PRODUCTVIEWS A
WHERE
A.SessionGUID IN (SELECT DISTINCT SessionGUID FROM VISITOR_SESSIONS_PRODUCTVIEWS WHERE ProductID = @ProductID)
GROUP BY
A.ProductID
ORDER BY
COUNT(A.ProductID) DESC

Is there a more efficient way to write this using GROUP BY/HAVING/PARTITON or some other more optimized method of getting the data the way I need it?

I am running SQL 2008 on my dev server, but will be running from SQL Azure when it goes live.

Upvotes: 1

Views: 103

Answers (3)

paparazzo
paparazzo

Reputation: 45096

SELECT A.ProductID
  FROM VISITOR_SESSIONS_PRODUCTVIEWS A
  JOIN VISITOR_SESSIONS_PRODUCTVIEWS S
    ON A.SessionGUID = S.SessionGUID 
   AND S.ProductID = @ProductID 
 GROUP B A.ProductID
 ORDER BY COUNT(DISTINCT(A.ProductID)) DESC

Upvotes: 0

Jordan Parker
Jordan Parker

Reputation: 1236

There are several ways you could do write differently that may perform better:

  • Temp Table of Sessions, then join to VISITOR_SESSIONS_PRODUCTVIEWS with those sessions
  • Join to sessions of VISITOR_SESSIONS_PRODUCTVIEWS with Product, then rejoin to VISITOR_SESSIONS_PRODUCTVIEWS with those sessions
  • EXISTS instead of DISTINCT querying on the existence of the product in question for that user's session

Here is the temp table solution:

SELECT DISTINCT SessionGUID 
INTO #sessionsWithProduct
FROM VISITOR_SESSIONS_PRODUCTVIEWS
WHERE ProductID = @ProductID;

SELECT
    A.ProductID
FROM VISITOR_SESSIONS_PRODUCTVIEWS A
    INNER JOIN #sessionsWithProduct S ON s.SessionGUID = A.SessionGUID
GROUP BY A.ProductID
ORDER BY COUNT(A.ProductID) DESC;

DROP TABLE #sessionsWithProduct;

Also extremely important is making sure that this table is indexed at least on ProductID and SessionGUID (one on each).

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270443

Often, EXISTS is more efficient than IN:

SELECT A.ProductID
FROM VISITOR_SESSIONS_PRODUCTVIEWS A
WHERE EXISTS (SELECT 1
              FROM VISITOR_SESSIONS_PRODUCTVIEWS B
              WHERE B.ProductID = @ProductID AND
                    A.SessionGUID = B.SessionGUID
             )
GROUP BY A.ProductID
ORDER BY COUNT(A.ProductID) DESC;

For best performance, you want an index on VISITOR_SESSIONS_PRODUCTVIEWS(SessionGUID, ProductId) and perhaps also VISITOR_SESSIONS_PRODUCTVIEWS(ProductId).

EDIT:

You could try writing this using window functions, but I'm not sure the performance would be better:

select productid
from (select pv.*,
             sum(case when productid = @productid then 1 else 0 end) over (partition by SessionGUID) as cnt
      from visitor_sessions_productviews 
     ) pv
where cnt > 0
group by productid
order by count(*) desc;

I'm not sure the performance would be better than the EXISTS method.

Upvotes: 2

Related Questions