Reputation: 37
I have a table with a couple million records in it which stores product views by SessionGUID (visitor), the table structure is:
ID (PK) SessionGUID (uniqueidentifier) ProductID (int) DateTime (datetime)
The purpose of the query is to return a list of Product ID's that were viewed by the same people who viewed a specific product ID (ie, customers who looked at X also looked at Y)
The query I'm using is as follows:
SELECT
A.ProductID
FROM
VISITOR_SESSIONS_PRODUCTVIEWS A
WHERE
A.SessionGUID IN (SELECT DISTINCT SessionGUID FROM VISITOR_SESSIONS_PRODUCTVIEWS WHERE ProductID = @ProductID)
GROUP BY
A.ProductID
ORDER BY
COUNT(A.ProductID) DESC
Is there a more efficient way to write this using GROUP BY/HAVING/PARTITON or some other more optimized method of getting the data the way I need it?
I am running SQL 2008 on my dev server, but will be running from SQL Azure when it goes live.
Upvotes: 1
Views: 103
Reputation: 45096
SELECT A.ProductID
FROM VISITOR_SESSIONS_PRODUCTVIEWS A
JOIN VISITOR_SESSIONS_PRODUCTVIEWS S
ON A.SessionGUID = S.SessionGUID
AND S.ProductID = @ProductID
GROUP B A.ProductID
ORDER BY COUNT(DISTINCT(A.ProductID)) DESC
Upvotes: 0
Reputation: 1236
There are several ways you could do write differently that may perform better:
Here is the temp table solution:
SELECT DISTINCT SessionGUID
INTO #sessionsWithProduct
FROM VISITOR_SESSIONS_PRODUCTVIEWS
WHERE ProductID = @ProductID;
SELECT
A.ProductID
FROM VISITOR_SESSIONS_PRODUCTVIEWS A
INNER JOIN #sessionsWithProduct S ON s.SessionGUID = A.SessionGUID
GROUP BY A.ProductID
ORDER BY COUNT(A.ProductID) DESC;
DROP TABLE #sessionsWithProduct;
Also extremely important is making sure that this table is indexed at least on ProductID and SessionGUID (one on each).
Upvotes: 0
Reputation: 1270443
Often, EXISTS
is more efficient than IN
:
SELECT A.ProductID
FROM VISITOR_SESSIONS_PRODUCTVIEWS A
WHERE EXISTS (SELECT 1
FROM VISITOR_SESSIONS_PRODUCTVIEWS B
WHERE B.ProductID = @ProductID AND
A.SessionGUID = B.SessionGUID
)
GROUP BY A.ProductID
ORDER BY COUNT(A.ProductID) DESC;
For best performance, you want an index on VISITOR_SESSIONS_PRODUCTVIEWS(SessionGUID, ProductId)
and perhaps also VISITOR_SESSIONS_PRODUCTVIEWS(ProductId)
.
EDIT:
You could try writing this using window functions, but I'm not sure the performance would be better:
select productid
from (select pv.*,
sum(case when productid = @productid then 1 else 0 end) over (partition by SessionGUID) as cnt
from visitor_sessions_productviews
) pv
where cnt > 0
group by productid
order by count(*) desc;
I'm not sure the performance would be better than the EXISTS
method.
Upvotes: 2