Reputation: 129
I have a table in BigQuery that looks like this:
Caller_Number | month | day| call_time
--------------|--------|-----|----------
1 | 5 | 15 | 12:56:17
I want to write a SQL query for BigQuery that will allow me to count the consecutive hours in which at least one call was made (sorted by caller_number), and the consecutive days in which at least 10 consecutive hours with calls occurred (sorter by caller_number). I have been looking at the existing resources on gaps and islands, but can't seem to figure out how to apply it to consecutive dates and hours.
Upvotes: 4
Views: 1410
Reputation: 173086
Below is working example for consecutive hours
Steps are
1.“Extract” hour from call_time
HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time))
2.Find previous hour
LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])
3.Calculate start of group of consecutive hours – 1 – start, 0 – group continuation
IFNULL(INTEGER([hour] - prev_hour > 1), 1)
4.Assign group number to each group
SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])
5.And finally – group by group number and count calls and hours
Hope this gives you good start for implementing similar logic for consecutive days on top of consec hours result
SELECT Caller_Number, [month], [day], seq_group,
EXACT_COUNT_DISTINCT([hour]) AS hours_count, COUNT(1) AS calls_count
FROM (
SELECT Caller_Number, [month], [day], [hour],
SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day]
ORDER BY [hour]) AS seq_group
FROM (
SELECT Caller_Number, [month], [day], [hour],
IFNULL(INTEGER([hour] - prev_hour > 1), 1) AS seq
FROM (
SELECT Caller_Number, [month], [day], [hour],
LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day]
ORDER BY [hour]) AS prev_hour
FROM (
SELECT Caller_Number, [month], [day],
HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) AS [hour]
FROM YourTable
)
)
)
)
GROUP BY Caller_Number, [month], [day], seq_group
Upvotes: 3