Reputation: 309821
Assume that I have a table (MyTable
) as follows:
item_id | date
----------------
1 | 2016-06-08
1 | 2016-06-07
1 | 2016-06-05
1 | 2016-06-04
1 | 2016-05-31
...
2 | 2016-06-08
2 | 2016-06-06
2 | 2016-06-04
2 | 2016-05-31
...
3 | 2016-05-31
...
I would like to build a weekly summary table that reports on a running 7 day window. The window would basically say "How many unique item_id
s were reported in the preceding 7 days"?
So, in this case, the output table would look something like:
date | weekly_ids
----------------------
2016-05-31| 3 # All 3 were present on the 31st
2016-06-01| 3 # All 3 were present on the 31st which is < 7 days before the 1st
2016-06-02| 3 # Same
2016-06-03| 3 # Same
2016-06-04| 3 # Same
2016-06-05| 3 # Same
2016-06-06| 3 # Same
2016-06-07| 3 # Same
2016-06-08| 2 # item 3 was not present for the entire last week so it does not add to the count.
I've tried:
SELECT
item_id,
date,
MAX(present) OVER (
PARTITION BY item_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS is_present
FROM (
# Inner query
SELECT
item_id,
date,
1 AS present,
FROM MyTable
)
GROUP BY date
ORDER BY date DESC
This feels like it is going in the right direction. But as it is, the window runs over the wrong time-frame when dates aren't present (too many dates) and it also doesn't output records for dates when the item_id
wasn't present (even if it was present on the previous date). Is there a simple resolution to this problem?
If it's helpful and necessary
item_id
s in existence.Upvotes: 3
Views: 1129
Reputation: 172944
Try below example. It can give you direction to explore
Purely GBQ - Legacy SQL
SELECT date, items FROM (
SELECT
date, COUNT(DISTINCT item_id) OVER(ORDER BY sec RANGE BETWEEN 60*60*24*2 PRECEDING AND CURRENT ROW) AS items
FROM (
SELECT
item_id, date, timestamp_to_sec(timestamp(date)) AS sec
FROM (
SELECT calendar.day AS date, MyTable.item_id AS item_id
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP('2016-05-28'), pos - 1, "DAY")) AS day
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP('2016-05-28')), '.'),'') AS h
FROM (SELECT NULL)),h
)))
) AS calendar
LEFT JOIN (
SELECT date, item_id
FROM
(SELECT 1 AS item_id, '2016-06-08' AS date),
(SELECT 1 AS item_id, '2016-06-07' AS date),
(SELECT 1 AS item_id, '2016-06-05' AS date),
(SELECT 1 AS item_id, '2016-06-04' AS date),
(SELECT 1 AS item_id, '2016-05-28' AS date),
(SELECT 2 AS item_id, '2016-06-08' AS date),
(SELECT 2 AS item_id, '2016-06-06' AS date),
(SELECT 2 AS item_id, '2016-06-04' AS date),
(SELECT 2 AS item_id, '2016-05-31' AS date),
(SELECT 3 AS item_id, '2016-05-31' AS date),
(SELECT 3 AS item_id, '2016-06-05' AS date)
) AS MyTable
ON calendar.day = MyTable.date
)
)
)
GROUP BY date, items
ORDER BY date
Please note
Upvotes: 1
Reputation: 15155
I have created a temp table to hold dates, however, you probably would benefit from adding a permanent table to your database for these joins. Trust me it will cause less headaches.
DECLARE @my_table TABLE
(
item_id int,
date DATETIME
)
INSERT @my_table SELECT 1,'2016-06-08'
INSERT @my_table SELECT 1,'2016-06-07'
INSERT @my_table SELECT 1,'2016-06-05'
INSERT @my_table SELECT 1,'2016-06-04'
INSERT @my_table SELECT 1,'2016-05-31'
INSERT @my_table SELECT 2,'2016-06-08'
INSERT @my_table SELECT 2,'2016-06-06'
INSERT @my_table SELECT 2,'2016-06-04'
INSERT @my_table SELECT 2,'2016-05-31'
INSERT @my_table SELECT 3,'2016-05-31'
DECLARE @TrailingDays INT=7
DECLARE @LowDate DATETIME='01/01/2016'
DECLARE @HighDate DATETIME='12/31/2016'
DECLARE @Calendar TABLE(CalendarDate DATETIME)
DECLARE @LoopDate DATETIME=@LowDate
WHILE(@LoopDate<=@HighDate) BEGIN
INSERT @Calendar SELECT @LoopDate
SET @LoopDate=DATEADD(DAY,1,@LoopDate)
END
SELECT
date=HighDate,
weekly_ids=COUNT(DISTINCT item_id)
FROM
(
SELECT
HighDate=C.CalendarDate,
LowDate=LAG(C.CalendarDate, @TrailingDays,0) OVER (ORDER BY C.CalendarDate)
FROM
@Calendar C
WHERE
CalendarDate BETWEEN @LowDate AND @HighDate
)AS X
LEFT OUTER JOIN @my_table MT ON MT.date BETWEEN LowDate AND HighDate
GROUP BY
LowDate,
HighDate
Upvotes: 2