Reputation: 45
I am doing a project for school and have been given a database over gps-recordings for three people during the course of a week. I am trying to group these recordings into trips based on the time between them. If a recording is within 300 seconds from the recording before it, they are considered to be part of the same trip, otherwise, they are considered part of different trips.
So far I have managed to calculate the time difference between a recording on row n and the one on row n-1 and I am now trying to create a function for merging the recordings intro trips. This would have been real easy in another programming language, but in this course we are using PostgreSQL which I am not that well versed in.
To solve this, I am trying to create a function with a variable that increases every time the time difference between two recordings is greater than 300 seconds and assigns each row to a trip based on the variable. This is as far as I have currently gotten, although at the moment, the variable resets X all the time, thus assigning all rows to trip 1...
CREATE OR REPLACE FUNCTION tripmerge(time_diff double precision)
RETURNs integer AS $$
declare
X integer := 1;
ID integer;
BEGIN
IF time_diff < 300 THEN
ID = X;
ELSE
ID =X;
X:=X+1;
END IF;
RETURN ID;
END;$$
LANGUAGE plpgsql;
How do I change so X
does not reset all the time? I am using PostgreSQL 9.1.
EDIT:
This is the table I am working with:
curr_rec (timestamp), prev_rec (timestamp), time_diff (double precision)
With this being a sample of the dataset:
'2013-11-14 05:22:33.991+01',null ,null
'2013-11-14 09:15:40.485+01','2013-11-14 05:22:33.991+01',13986.494
'2013-11-14 09:17:04.837+01','2013-11-14 09:15:40.485+01',84.352
'2013-11-14 09:17:43.055+01','2013-11-14 09:17:04.837+01',38.218
'2013-11-14 09:23:24.205+01','2013-11-14 09:17:43.055+01',341.15
The expected result would add a column:
tripID
1
2
2
2
3
And I think this fiddle should be working: http://sqlfiddle.com/#!1/4e3e5/1/0
Upvotes: 1
Views: 435
Reputation: 656784
This query uses only curr_rec
, not the other redundant, precomputed columns:
SELECT 1 + count(step OR NULL) OVER (ORDER BY curr_rec) AS trip_id
FROM (
SELECT curr_rec
,lag(curr_rec) OVER (ORDER BY curr_rec) AS prev_rec
,curr_rec - lag(curr_rec) OVER (ORDER BY curr_rec)
> interval '5 min' AS step
FROM timestamps
) x;
Key features are:
lag()
, which I use to see if the previous row is more than 5 minutes ago. (Just using an interval
for the comparison, no need to extract seconds)count()
- that's just the basic aggregate function with an OVER
clause.step OR NULL
only leaves TRUE
or NULL
, where only TRUE
is counted in a running count, thereby arriving at your desired result.SQL Fiddle (building on the one you provided).
Upvotes: 3