arilwan
arilwan

Reputation: 3993

PostgreSQL: aggregate records by time interval

I would like to generate reports of GPS capturing rate by travel modes.

I have in a table modes the type of travel modes used by users.

CREATE TABLE modes
(
    user_id integer NOT NULL,
    trip_id int,
    start_time timestamp with time zone NOT NULL,
    end_time timestamp with time zone NOT NULL,
    travelmode text ,
    PRIMARY KEY (user_id, start_time, end_time)
)

So for example, the following is the sample data for travel mode by user 10 for different trips.

INSERT INTO modes (user_id, trip_id, start_time, end_time, travelmode)
VALUES (10,1000,'2008-06-18 13:28:18+01','2008-06-18 13:32:20+01','bus'),
      (10,1001,'2008-06-18 14:47:35+01','2008-06-18 15:05:31+01','bus'),
      (10,1002,'2008-08-01 02:51:47+01','2008-08-01 03:37:43+01','metro'),
      (10,1003,'2008-08-01 03:59:36+01','2008-08-01 04:30:20+01','metro'),
      (10,1004,'2008-08-01 05:20:07+01','2008-08-01 07:03:51+01','car'),
      (10,1005,'2008-08-01 07:17:08+01','2008-08-01 08:06:26+01','bus'),
      (10,1006,'2008-09-15 23:54:20+01','2008-09-16 00:02:44+01','bus'),
      (10,1007,'2008-09-16 00:10:22+01','2008-09-16 00:28:29+01','bus'),
      (10,1008,'2008-09-16 00:58:43+01','2008-09-16 01:07:14+01','metro')

And then for each user and for each trip, user's GPS traces are recorded in a table plt_distinct:

CREATE TABLE plt_distinct
(
    user_id int,
    trip_id int, 
    logtime timestamp with time zone NOT NULL,
    lat double precision NOT NULL,
    lon double precision NOT NULL,
    alt double precision,
    PRIMARY KEY (trip_id, logtime)
)

Like so, for the user given in sample data above, following are the sample GPS traces for a particular trip:

INSERT INTO plt_distinct (user_id, trip_id, logtime, lat, lon, alt)
VALUES (10,1002,'2008-06-18 04:46:20+01',39.940474,116.346754,233),
      (10,1002,'2008-06-18 04:46:21+01',39.940491,116.346745,233),
      (10,1002,'2008-06-18 04:46:23+01',39.940526,116.346734,233),
      (10,1002,'2008-06-18 04:46:25+01',39.940573,116.346725,233),
      (10,1002,'2008-06-18 04:46:31+01',39.940815,116.346688,230),
      (10,1002,'2008-06-18 04:46:32+01',39.940861,116.346661,230),
      (10,1002,'2008-06-18 04:46:33+01',39.940941,116.346599,233),
      (10,1002,'2008-06-18 04:46:35+01',39.941109,116.34658,233),
      (10,1002,'2008-06-18 04:46:39+01',39.941464,116.346561,240),
      (10,1002,'2008-06-18 04:46:40+01',39.941558,116.346521,246),
      (10,1002,'2008-06-18 04:46:42+01',39.941816,116.346438,259)

The given sample are traces by metro mode. For analysis purposes, I am interested in aggregating the GPS traces interval for each mode (especially metro as GPS is not available underground).

I make available, these tables and the sample data in this DB-fiddle.

The expected result is something like this:

+-----------------------+---------------+-----------------+-----------------+----------------+
| count of metro(total) | interval (1s) | interval (2-5s) | interval(6-10s) | interval(>10s) |
+-----------------------+---------------+-----------------+-----------------+----------------+
|                    10 |             4 |               5 |               1 |              0 |
+-----------------------+---------------+-----------------+-----------------+----------------+

Upvotes: 2

Views: 447

Answers (2)

Akhilesh Mishra
Akhilesh Mishra

Reputation: 6140

You should use LAG() and Filter clause to achieve this:

Try This:

select 
count(*) filter (where time_>0) "count of metro(total)",
count(*) filter (where time_=1) "interval (1s)",
count(*) filter (where time_ between 2 and 5) "interval (2-5s)",
count(*) filter (where time_ between 6 and 10) "interval(6-10s)",
count(*) filter (where time_ >10) "interval(>10s)"
from 
(
select 
coalesce(extract (epoch from (t1.logtime- lag(t1.logtime) over (partition by t1.trip_id order by t1.trip_id, t1.logtime))),0) as "time_"
                                                       
from plt_distinct t1 

inner join modes t2 on t1.user_id=t2.user_id and t1.trip_id=t2.trip_id 
where t2.travelmode='metro'
) t                             

DEMO of Fiddle

Upvotes: 2

shoek
shoek

Reputation: 400

with ordered_plt as (
 select
  p.*
  ,row_number() over (partition by p.trip_id order by p.logtime) as rn
 from plt_distinct as p
 inner join modes
 on p.trip_id = modes.trip_id
 where modes.travelmode = 'metro'
), plt_with_prev as (
 select
  ord.*
  ,prev.logtime as prev_logtime
 from ordered_plt as ord
 inner join ordered_plt as prev
 on ord.trip_id = prev.trip_id and ord.rn-1 = prev.rn
), elapsed as (
 select
  extract(second from (pwp.logtime - pwp.prev_logtime)) as sec
 from plt_with_prev as pwp
)
select
 count(*) as total
 ,sum(case when sec <= 1 then 1 else 0 end) as zero_to_1
 ,sum(case when 1 < sec and sec <= 5 then 1 else 0 end) as one_to_5
 ,sum(case when 5 < sec and sec <= 10 then 1 else 0 end) as five_to_10
 ,sum(case when 10 < sec then 1 else 0 end) as ten_to_inf
from elapsed

db<>fiddle

Things are much more easier if you add column prev_logtime to table plt_distinct.

Upvotes: 2

Related Questions