ShieldData
ShieldData

Reputation: 125

Count Visits by Source for 30 Day Prior Period for each Purchase

I have a table that logs website activity with the following Columns and Data

ID  Date         Source  Revenue

1   2013-10-01   A       0
2   2013-10-01   A       0
3   2013-10-01   B       10
1   2013-10-02   A       40
4   2013-10-03   B       0
3   2013-10-03   B       0
4   2013-10-04   A       10

I am trying to create a table that takes each transaction (Revenue > 0) and counts all of the visits by source in individual columns for the last 30 days. It should look something like this.

ID  Date         Source  Revenue Count_A  Count_B

3   2013-10-01   B       10      0        1
1   2013-10-02   A       40      2        0
4   2013-10-04   A       10      1        1

I have tried using a subquery for each of these columns, but the Counts are way off and I am not sure why.

Select ID,
       Date,
       Source,
       Revenue,
       (SELECT Count(*) 
               FROM table t2
               WHERE t2.Date between t.Date-30 and t.Date and Source = 'A') AS Count_A
       (SELECT Count(*) 
               FROM table t3
               WHERE t3.Date between t.Date-30 and t.Date and Source = 'B') AS Count_B
  FROM table t
  Where Revenue > 0
  Order By WMEID

I am using Microsoft SQL Server.

Upvotes: 2

Views: 111

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269693

Use a lateral join:

Select l.*, l2.*
from logs l outer apply
     (select sum(case when l2.source = 'A' then 1 else 0 end) as count_a,
             sum(case when l2.source = 'B' then 1 else 0 end) as count_b
      from logs l2
      where l2.id = l.id and
            l2.date >= dateadd(day, -30, l.date) and
            l2.date <= l.date
     ) l2
where l.Revenue > 0
order By l.WMEID;

I think the issue with your approach is that you are not matching the ids.

Upvotes: 3

Eric Brandt
Eric Brandt

Reputation: 8101

Your counts are off because your sub-selects aren't correlated to the outer query, so the totals are coming up independent of the other data in the row. Also, there's no GROUP BY in the sub-selects, so you're getting a total table count. And I'm not so sure about that date logic.

You could correct all this by adding the correlation to each sub-select (WHERE...t2.ID = t.ID AND t2.Date = t.Date, etc) and including an appropriate GROUP BY clause for each of those queries. But that's rather a lot of typing, hard to maintain, and hard to read. It will also probably generate multiple table scans, so it could become a performance issue.

Instead, I'd opt for conditional aggregation:

Select t.ID,
       t.Date,
       t.Source,
       SUM(t.Revenue) AS Revenue,
       SUM(CASE WHEN t.Source = 'A' THEN 1 ELSE 0 END) AS Count_A,
       SUM(CASE WHEN t.Source = 'B' THEN 1 ELSE 0 END) AS Count_B
  FROM mytable t
  Where Revenue > 0
    AND t.Date >= DATEADD(DAY, -30, CAST(GETDATE() AS date))
    AND t.Date < CAST(GETDATE() AS date)
  GROUP BY 
       t.ID,
       t.Date,
       t.Source
  Order By t.Date

Results (Based on the structure in the question, not the data):

+----+------------+--------+---------+---------+---------+
| ID |    Date    | Source | Revenue | Count_A | Count_B |
+----+------------+--------+---------+---------+---------+
|  3 | 2020-05-01 | B      |      60 |       0 |       2 |
|  1 | 2020-05-02 | A      |      40 |       1 |       0 |
|  4 | 2020-05-04 | A      |      10 |       1 |       0 |
+----+------------+--------+---------+---------+---------+

Here's a SQL Fiddle.

Upvotes: 1

Related Questions