Reputation: 77
My question is to segregate data from time information in SQL
I have transaction data for trams. Something like this
DateTime
2018-04-03T08:06:04
2018-04-03T08:07:27
2018-04-03T08:18:18
2018-04-03T10:08:27
2018-04-03T10:22:24
2018-04-03T12:08:50
2018-04-03T12:24:49
2018-04-03T12:24:51
This is the customer tap on information for specific tram on specific day. This contains 3 journey of the same tram
How can I rank them as 3 different journeys so desired result will be like this
Rank DateTime
1 2018-04-03T08:06:04
1 2018-04-03T08:07:27
1 2018-04-03T08:18:18
2 2018-04-03T10:08:27
2 2018-04-03T10:22:24
3 2018-04-03T12:08:50
3 2018-04-03T12:24:49
3 2018-04-03T12:24:51
I tried by adding +- 40 mins to time and all transactions coming in the range will be one unique journey. But couldn't succeed.
Upvotes: 1
Views: 78
Reputation: 172974
Below is for BigQuery Standard SQL
#standardSQL
SELECT dt, 1 + SUM(start) OVER(ORDER BY dt) journey
FROM (
SELECT dt, IF(TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40, 1, 0) start
FROM `project.dataset.table`
)
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2018-04-03T08:06:04' dt UNION ALL
SELECT '2018-04-03T08:07:27' UNION ALL
SELECT '2018-04-03T08:18:18' UNION ALL
SELECT '2018-04-03T10:08:27' UNION ALL
SELECT '2018-04-03T10:22:24' UNION ALL
SELECT '2018-04-03T12:08:50' UNION ALL
SELECT '2018-04-03T12:24:49' UNION ALL
SELECT '2018-04-03T12:24:51'
)
SELECT dt, 1 + SUM(start) OVER(ORDER BY dt) journey
FROM (
SELECT dt, IF(TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40, 1, 0) start
FROM `project.dataset.table`
)
-- ORDER BY dt
with result
Row dt journey
1 2018-04-03 08:06:04 UTC 1
2 2018-04-03 08:07:27 UTC 1
3 2018-04-03 08:18:18 UTC 1
4 2018-04-03 10:08:27 UTC 2
5 2018-04-03 10:22:24 UTC 2
6 2018-04-03 12:08:50 UTC 3
7 2018-04-03 12:24:49 UTC 3
8 2018-04-03 12:24:51 UTC 3
Note: I derived logic from below statement in your question
I tried by adding +- 40 mins to time and all transactions coming in the range will be one unique journey
A liiiittle less verbose version:
#standardSQL
SELECT dt, 1 + COUNTIF(start) OVER(ORDER BY dt) journey
FROM (
SELECT dt, (TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40) start
FROM `project.dataset.table`
)
ORDER BY dt
Upvotes: 1