Reputation: 125
I have a table that logs website activity with the following Columns and Data
ID Date Source Revenue
1 2013-10-01 A 0
2 2013-10-01 A 0
3 2013-10-01 B 10
1 2013-10-02 A 40
4 2013-10-03 B 0
3 2013-10-03 B 0
4 2013-10-04 A 10
I am trying to create a table that takes each transaction (Revenue > 0) and counts all of the visits by source in individual columns for the last 30 days. It should look something like this.
ID Date Source Revenue Count_A Count_B
3 2013-10-01 B 10 0 1
1 2013-10-02 A 40 2 0
4 2013-10-04 A 10 1 1
I have tried using a subquery for each of these columns, but the Counts are way off and I am not sure why.
Select ID,
Date,
Source,
Revenue,
(SELECT Count(*)
FROM table t2
WHERE t2.Date between t.Date-30 and t.Date and Source = 'A') AS Count_A
(SELECT Count(*)
FROM table t3
WHERE t3.Date between t.Date-30 and t.Date and Source = 'B') AS Count_B
FROM table t
Where Revenue > 0
Order By WMEID
I am using Microsoft SQL Server.
Upvotes: 2
Views: 111
Reputation: 1269693
Use a lateral join:
Select l.*, l2.*
from logs l outer apply
(select sum(case when l2.source = 'A' then 1 else 0 end) as count_a,
sum(case when l2.source = 'B' then 1 else 0 end) as count_b
from logs l2
where l2.id = l.id and
l2.date >= dateadd(day, -30, l.date) and
l2.date <= l.date
) l2
where l.Revenue > 0
order By l.WMEID;
I think the issue with your approach is that you are not matching the ids.
Upvotes: 3
Reputation: 8101
Your counts are off because your sub-selects aren't correlated to the outer query, so the totals are coming up independent of the other data in the row. Also, there's no GROUP BY
in the sub-selects, so you're getting a total table count. And I'm not so sure about that date logic.
You could correct all this by adding the correlation to each sub-select (WHERE...t2.ID = t.ID AND t2.Date = t.Date, etc
) and including an appropriate GROUP BY
clause for each of those queries. But that's rather a lot of typing, hard to maintain, and hard to read. It will also probably generate multiple table scans, so it could become a performance issue.
Instead, I'd opt for conditional aggregation:
Select t.ID,
t.Date,
t.Source,
SUM(t.Revenue) AS Revenue,
SUM(CASE WHEN t.Source = 'A' THEN 1 ELSE 0 END) AS Count_A,
SUM(CASE WHEN t.Source = 'B' THEN 1 ELSE 0 END) AS Count_B
FROM mytable t
Where Revenue > 0
AND t.Date >= DATEADD(DAY, -30, CAST(GETDATE() AS date))
AND t.Date < CAST(GETDATE() AS date)
GROUP BY
t.ID,
t.Date,
t.Source
Order By t.Date
Results (Based on the structure in the question, not the data):
+----+------------+--------+---------+---------+---------+
| ID | Date | Source | Revenue | Count_A | Count_B |
+----+------------+--------+---------+---------+---------+
| 3 | 2020-05-01 | B | 60 | 0 | 2 |
| 1 | 2020-05-02 | A | 40 | 1 | 0 |
| 4 | 2020-05-04 | A | 10 | 1 | 0 |
+----+------------+--------+---------+---------+---------+
Here's a SQL Fiddle.
Upvotes: 1