Reputation: 157
Let's say I have a table activities
with fields starttime (TIMESTAMP)
and stoptime (TIMESTAMP)
. I want to find a moment during which there were most activities occurring. The query should return first such moment.
I have tried to get all starttime
timestamps and then for each of them count the number of activities that are occurring at that moment. Then find maximum:
#standardSQL
SELECT
time,
(
SELECT COUNT(*)
FROM activities
WHERE starttime <= time AND time <= stoptime
) AS cnt
FROM (
SELECT DISTINCT starttime AS time
FROM activities
ORDER BY time
)
ORDER BY cnt DESC, time ASC
LIMIT 1
Unfortunately it says: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I think that a proper algorithm for this outside of databases world would be to get all starttimes
and stoptimes
put them in an array in a way that they will be distinguishable, sort it and then go sequentially by that array looking for a maximum moment. However, I have no idea how to express such an algorithm in SQL.
I have seen this but I don't think it helps in any way.
Upvotes: 1
Views: 544
Reputation: 173028
Consider below version
From my point of view it returns more practical output - which is - all periods (respective start and end) of consecutive activity of the same level
So you will now not just starts but whole period (start and end) with the highest activity. and not just one, but all of them
#standardSQL
WITH intervals AS (
SELECT time AS start_, LEAD(time) OVER(ORDER BY time) AS end_
FROM (
SELECT DISTINCT time FROM (
SELECT starttime AS time FROM activities UNION ALL
SELECT stoptime AS time FROM activities ))
),
equals AS (
SELECT start_, end_, COUNT(1) AS cumsum
FROM intervals AS i
JOIN activities AS a
ON i.start_ >= a.starttime AND i.end_ <= a.stoptime
GROUP BY start_, end_
),
grps AS (
SELECT
start_, end_, cumsum,
IFNULL(
CAST(end_ = LEAD(start_) OVER(ORDER BY start_) AND LEAD(cumsum) OVER(ORDER BY start_) = cumsum AS INT64),
CAST(NOT((start_ = LAG(end_) OVER(ORDER BY start_) AND LAG(cumsum) OVER(ORDER BY start_) = cumsum)) AS INT64)
) AS flag
FROM equals
)
SELECT MIN(start_) AS start_, MAX(end_) AS end_, cumsum
FROM (
SELECT start_, end_, cumsum, SUM(flag) OVER(ORDER BY start_) AS grp
FROM grps
)
GROUP BY cumsum, grp
ORDER BY start_
you can play with above using dummy activities table
WITH activities AS (
SELECT 1 AS starttime, 3 AS stoptime UNION ALL
SELECT 1 AS starttime, 4 AS stoptime UNION ALL
SELECT 4 AS starttime, 5 AS stoptime UNION ALL
SELECT 7 AS starttime, 8 AS stoptime UNION ALL
SELECT 7 AS starttime, 10 AS stoptime UNION ALL
SELECT 8 AS starttime, 12 AS stoptime
)
or
WITH activities AS (
SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 MINUTE) AS stoptime UNION ALL
SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS stoptime UNION ALL
SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE) AS stoptime UNION ALL
SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS stoptime UNION ALL
SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE) AS stoptime UNION ALL
SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 MINUTE) AS stoptime
)
Upvotes: 1
Reputation: 157
I have achieved something close to the algorithm I have described in the question. It works decently fast but if you find anything better I'd be glad to see it.
#standardSQL
SELECT time, SUM(add) OVER(ORDER BY time ASC, add DESC) AS cumsum
FROM (
SELECT starttime AS time, 1 AS add
FROM activities UNION ALL
SELECT stoptime AS time, -1 AS add
FROM activities
)
ORDER BY cumsum DESC
Upvotes: 3