argunaw
argunaw

Reputation: 129

Using Gaps and Islands to Find Consecutive Hours/Dates- SQL/BigQuery

I have a table in BigQuery that looks like this:

Caller_Number | month  |  day| call_time
--------------|--------|-----|----------
1             |  5     |  15 | 12:56:17

I want to write a SQL query for BigQuery that will allow me to count the consecutive hours in which at least one call was made (sorted by caller_number), and the consecutive days in which at least 10 consecutive hours with calls occurred (sorter by caller_number). I have been looking at the existing resources on gaps and islands, but can't seem to figure out how to apply it to consecutive dates and hours.

Upvotes: 4

Views: 1410

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173086

Below is working example for consecutive hours
Steps are
1.“Extract” hour from call_time

HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time))

2.Find previous hour

LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

3.Calculate start of group of consecutive hours – 1 – start, 0 – group continuation

IFNULL(INTEGER([hour] - prev_hour > 1), 1)

4.Assign group number to each group

SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

5.And finally – group by group number and count calls and hours

Hope this gives you good start for implementing similar logic for consecutive days on top of consec hours result

SELECT Caller_Number, [month], [day], seq_group, 
  EXACT_COUNT_DISTINCT([hour]) AS hours_count, COUNT(1) AS calls_count 
FROM (
  SELECT Caller_Number, [month], [day], [hour],  
    SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] 
                  ORDER BY [hour]) AS seq_group
  FROM (
    SELECT Caller_Number, [month], [day], [hour], 
      IFNULL(INTEGER([hour] - prev_hour > 1), 1) AS seq
    FROM (
      SELECT Caller_Number, [month], [day], [hour], 
        LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] 
                         ORDER BY [hour]) AS prev_hour
      FROM (
        SELECT Caller_Number, [month], [day], 
          HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) AS [hour] 
        FROM YourTable
      )
    )
  )
)
GROUP BY Caller_Number, [month], [day], seq_group

Upvotes: 3

Related Questions