Reputation: 7577
I have a table that has these columns:
UserID1, UserID2, ProductID, PurchaseDate
The following query run in a purchases table and return couple of users that have more than a number of interactions between them, regardless the order in the last 31 days:
DECLARE @threshold AS INT
DECLARE @days AS INT
SET @threshold = 10
SET @days = 31
SELECT
UserID1, UserID2, COUNT(*) AS Counter
FROM
(SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1,
CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
FROM
Purchases WITH(NOLOCK)
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()) t
GROUP BY
UserID1, UserID2
HAVING
COUNT(*) > @threshold
Yields to:
UserID1 UserID2 Counter
1 2 10
3 2 5
4 1 8
However, what I want is to return a table with the ProductID
and the PurchaseDate
in separate rows like this
UserID1 UserID2 ProductID PurchaseDate
1 2 12345 2017-01-18 00:13:52
1 2 5425 2017-01-12 15:10:02
1 2 64362 2017-01-05 10:10:02
..... for the 10 interactions
3 2 25235 2017-01-18 00:13:52
3 2 436346 2017-01-14 00:13:52
..... for the 5 interactions
4 1 23523 2017-01-14 00:13:52
4 1 135135 2017-01-09 00:13:52
..... for the 8 interactions
Is there any way without putting the results of the first query in a temp table and then join it again with the Purchases
table to find all the purchases?
Upvotes: 1
Views: 118
Reputation: 32695
If I understood you correctly, then simple windowed COUNT
would help here.
The optimiser should be smart enough to do it in one scan of the table.
DECLARE @threshold AS INT;
DECLARE @days AS INT;
SET @threshold = 10;
SET @days = 31;
WITH
CTE_Purchases
AS
(
SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1
,CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
,ProductID
,PurchaseDate
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()
)
,CTE_Counts
AS
(
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,COUNT(*) OVER (PARTITION BY UserID1, UserID2) AS Counter
-- calc COUNT for groups without explicit GROUP BY
FROM CTE_Purchases
)
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,Counter
FROM CTE_Counts
WHERE Counter > @threshold
-- this filter is instead of your HAVING
;
Upvotes: 2
Reputation: 1212
Disclaimer: I have not tested the code, it was written outside of the T-SQL IDE. Code below is based on the assumption that the following is always true: UserID1 != UserID2.
1) I would suggest using a MAX/MIN values solution to treat [Col1,Col2] the same way as [Col2,Col1]. It's probably going to perform better and handles NULLs properly too. You need SQL Server 2008 (or higher) for this to work.
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID1,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID2
FROM
Purchases
2) Now we need to actually count the interactions between them, which should be easy enough. To keep the code clean we can use a CTE for our previous statement, I'm adding the deadline filter there:
;WITH CTE_UserInteractions AS (
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)
SELECT
FirstUser,
SecondUser
FROM
CTE_UserInteractions
GROUP BY
FirstUser, SecondUser
HAVING
COUNT(*) > @Threshold
Quick note here: one might find that calculating the left deadline boundary ahead of time positively impacts performance. For example, before running the batch we can do:
DECLARE @StartDate DATETIME = DATEADD(DAY,-@days,GETDATE())
And then we can use the @StartDate in the WHERE clause.
3) Finally, we can use CROSS APPLY to get the list of products and purchases for user "pairs" that are left as a result. We can either use a sub-select (my solution) or pre-populate a temporary table with the step #2 results if performance is affected.
;WITH CTE_UserInteractions AS (
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
FROM
Purchases AS p1
WHERE
Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)
SELECT
groupedUsers.FirstUser as UserID1,
groupedUsers.SecondUser as UserID2,
products.ProductID,
products.PurchaseDate
FROM (
SELECT
FirstUser,
SecondUser
FROM
CTE_UserInteractions
GROUP BY
FirstUser, SecondUser
HAVING
COUNT(*) > @Threshold
) groupedUsers
CROSS APPLY (
SELECT
ProductID, PurchaseDate
FROM
Purchases AS p1
WHERE
p1.UserID1 = FirstUser AND p1.UserID2 = SecondUser
UNION ALL
SELECT
ProductID, PurchaseDate
FROM
Purchases AS p2
WHERE
p2.UserID2 = FirstUser AND p2.UserID1 = SecondUser
) products
Upvotes: 0