Rahulkg007
Rahulkg007

Reputation: 77

Partition (segregate) by time period span in SQL

My question is to segregate data from time information in SQL

I have transaction data for trams. Something like this

DateTime
2018-04-03T08:06:04
2018-04-03T08:07:27
2018-04-03T08:18:18
2018-04-03T10:08:27
2018-04-03T10:22:24
2018-04-03T12:08:50
2018-04-03T12:24:49
2018-04-03T12:24:51

This is the customer tap on information for specific tram on specific day. This contains 3 journey of the same tram

  1. First Journey first transaction is at 8:06, second at 8:07 and third at 8:18
  2. Second Journey first transaction is at 10:08, second at 10:22
  3. Third Journey first transaction is at 12:08, second at 12:24 and third at 24:24

How can I rank them as 3 different journeys so desired result will be like this

Rank    DateTime
1   2018-04-03T08:06:04
1   2018-04-03T08:07:27
1   2018-04-03T08:18:18
2   2018-04-03T10:08:27
2   2018-04-03T10:22:24
3   2018-04-03T12:08:50
3   2018-04-03T12:24:49
3   2018-04-03T12:24:51

I tried by adding +- 40 mins to time and all transactions coming in the range will be one unique journey. But couldn't succeed.

Upvotes: 1

Views: 78

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

#standardSQL
SELECT dt, 1 + SUM(start) OVER(ORDER BY dt) journey 
FROM (
  SELECT dt, IF(TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40, 1, 0) start
  FROM `project.dataset.table`
)

You can test, play with above using dummy data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT TIMESTAMP '2018-04-03T08:06:04' dt UNION ALL
  SELECT '2018-04-03T08:07:27' UNION ALL
  SELECT '2018-04-03T08:18:18' UNION ALL
  SELECT '2018-04-03T10:08:27' UNION ALL
  SELECT '2018-04-03T10:22:24' UNION ALL
  SELECT '2018-04-03T12:08:50' UNION ALL
  SELECT '2018-04-03T12:24:49' UNION ALL
  SELECT '2018-04-03T12:24:51' 
)
SELECT dt, 1 + SUM(start) OVER(ORDER BY dt) journey 
FROM (
  SELECT dt, IF(TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40, 1, 0) start
  FROM `project.dataset.table`
)
-- ORDER BY dt 

with result

Row dt                          journey  
1   2018-04-03 08:06:04 UTC     1    
2   2018-04-03 08:07:27 UTC     1    
3   2018-04-03 08:18:18 UTC     1    
4   2018-04-03 10:08:27 UTC     2    
5   2018-04-03 10:22:24 UTC     2    
6   2018-04-03 12:08:50 UTC     3    
7   2018-04-03 12:24:49 UTC     3    
8   2018-04-03 12:24:51 UTC     3      

Note: I derived logic from below statement in your question

I tried by adding +- 40 mins to time and all transactions coming in the range will be one unique journey

A liiiittle less verbose version:

#standardSQL
SELECT dt, 1 + COUNTIF(start) OVER(ORDER BY dt) journey 
FROM (
  SELECT dt, (TIMESTAMP_DIFF(dt, LAG(dt) OVER(ORDER BY dt), MINUTE) > 40) start
  FROM `project.dataset.table`
)
ORDER BY dt

Upvotes: 1

Related Questions