Andi-lo
Andi-lo

Reputation: 2312

Getting amount of active trips via postgresql loop

I am searching for a way to get the amount of active trips in a GTFS feed for each minute of a day through a postgresql query.

For every trip I have the start and endtimes (in seconds) in a denormalized table. It looks something like this:

denormalized_trips table

The query to give me trips that are active for a given timeframe (e.g. here 43000 to 43600 seconds) looks like this:

SELECT
COUNT(trips.trip_id)
FROM denormalized_trips AS trips

LEFT JOIN gtfs_calendar_dates AS calendar_dates
ON calendar_dates.service_id = trips.service_id
AND calendar_dates.agency_key = trips.agency_key
AND date = '2017-07-03'
AND exception_type = 1

INNER JOIN gtfs_calendar AS calendar
  ON trips.service_id = calendar.service_id
  AND calendar.agency_key = trips.agency_key
  AND calendar.wednesday = 1

WHERE (
  trip_start_time BETWEEN 46800 AND 47100
  AND '2017-07-03' BETWEEN calendar.start_date AND calendar.end_date
)
AND NOT EXISTS (
  SELECT 0
  FROM gtfs_calendar_dates AS date_exceptions
  WHERE date = '2017-07-03'
  AND date_exceptions.agency_key = trips.agency_key
  AND date_exceptions.service_id = calendar.service_id
  AND exception_type = 2
);

This will result in 12 trips that will start between 13:00pm and 13:05pm.

Now I want to do that for the whole day. I want to get the amount of trips that get active in an interval of e.g. 1 minute or maybe more like 5 minutes. I tried it with a loop but this seems to just give me 1 result back. Here is what I came up with for now:

CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS INTEGER AS
$BODY$
DECLARE 
 count INTEGER;
BEGIN 
  FOR counter IN 43130..50000 BY 60 LOOP
    SELECT
    COUNT(trips.trip_id)
    INTO count
    FROM denormalized_trips AS trips

    LEFT JOIN gtfs_calendar_dates AS calendar_dates
    ON calendar_dates.service_id = trips.service_id
    AND calendar_dates.agency_key = trips.agency_key
    AND date = '2017-07-03'
    AND exception_type = 1

    INNER JOIN gtfs_calendar AS calendar
      ON trips.service_id = calendar.service_id
      AND calendar.agency_key = trips.agency_key
      AND calendar.wednesday = 1

    WHERE (
      trip_start_time BETWEEN counter AND counter + 60
      AND '2017-07-03' BETWEEN calendar.start_date AND calendar.end_date
    )
    AND NOT EXISTS (
      SELECT 0
      FROM gtfs_calendar_dates AS date_exceptions
      WHERE date = '2017-07-03'
      AND date_exceptions.agency_key = trips.agency_key
      AND date_exceptions.service_id = calendar.service_id
      AND exception_type = 2
    );
  END LOOP;
  RETURN count;
END; 
$BODY$ LANGUAGE plpgsql STABLE;

The result of calling SELECT get_active_trips(1);

sum of entries

Now I would like to get something like a table or an array of results back instead of just 1 entry. How would I do that?

Any help is highly appreciated.

Upvotes: 1

Views: 151

Answers (2)

Abelisto
Abelisto

Reputation: 15614

There are two syntaxes to create function returning set of values: returns setof <type> and returns table(<columns definition>). Documentation.

There are also several ways to return those values from the plpgsql function: return next or return query. Documentation.

So if you want just get the series of integers you could to rewrite your function in that way:

CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS SETOF INTEGER AS
$BODY$
DECLARE 
 count INTEGER;
BEGIN 
  FOR counter IN 43130..50000 BY 60 LOOP
    SELECT
    COUNT(trips.trip_id)
    INTO count
    FROM denormalized_trips AS trips
    <rest of query here>
    ;
    RETURN NEXT count;
  END LOOP;
  RETURN;
END; 
$BODY$ LANGUAGE plpgsql STABLE;

or, using RETURN QUERY:

CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS SETOF INTEGER AS
$BODY$
BEGIN 
  FOR counter IN 43130..50000 BY 60 LOOP
    RETURN QUERY
      SELECT
      COUNT(trips.trip_id)
      FROM denormalized_trips AS trips
      <rest of query here>
      ;
  END LOOP;
  RETURN;
END; 
$BODY$ LANGUAGE plpgsql STABLE;

In case if you want to return more then one column:

CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS TABLE (counter_value int, active_trips_count int) AS
$BODY$
BEGIN 
  FOR counter IN 43130..50000 BY 60 LOOP
    SELECT
    COUNT(trips.trip_id)
    INTO active_trips_count
    FROM denormalized_trips AS trips
    <rest of query here>
    ;
    counter_value := counter;
    RETURN NEXT; -- There is no parameters, current values of counter_value and active_trips_count will be returned
  END LOOP;
  RETURN;
END; 
$BODY$ LANGUAGE plpgsql STABLE;

or, using RETURN QUERY:

CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS TABLE (counter_value int, active_trips_count int) AS
$BODY$
BEGIN 
  FOR counter IN 43130..50000 BY 60 LOOP
    RETURN QUERY
      SELECT
        counter,
        COUNT(trips.trip_id)
      FROM denormalized_trips AS trips
      <rest of query here>
      ;
  END LOOP;
  RETURN;
END; 
$BODY$ LANGUAGE plpgsql STABLE;

Finally there is alternative declaration for returns table:

CREATE OR REPLACE FUNCTION get_active_trips(
  n int,
  out counter_value int, 
  out active_trips_count int)
RETURNS SETOF RECORD AS

Update

But(!) I feeling that it is possible to simplify your task using single query, without loop.

Consider the following query (i'v used your simplified query from sqlfiddle):

select
  counter,
  count(trips.trip_id)
from
  generate_series(43130, 50000, 60) as counter left join
    denormalized_trips as trips on (trip_start_time between counter and counter + 60) 
group by counter 
order by counter;

Upvotes: 2

Anuraag Veerapaneni
Anuraag Veerapaneni

Reputation: 679

Create type like below

drop type if exists get_active_trips_out;
create type get_active_trips_out as
(
   Count int
);

use that table type while returning the data from the function like below

CREATE OR REPLACE FUNCTION get_active_trips(n int)
RETURNS setof get_active_trips_out AS
$BODY$
DECLARE 
 count INTEGER;
r get_active_trips_out%rowtype;
BEGIN 
 create temp table tmp_count
(
  Count int
)on commit drop;
  FOR counter IN 43130..50000 BY 60 LOOP
    Insert into tmp_count
    SELECT
    COUNT(trips.trip_id)
    FROM denormalized_trips AS trips

    LEFT JOIN gtfs_calendar_dates AS calendar_dates
    ON calendar_dates.service_id = trips.service_id
    AND calendar_dates.agency_key = trips.agency_key
    AND date = '2017-07-03'
    AND exception_type = 1

    INNER JOIN gtfs_calendar AS calendar
      ON trips.service_id = calendar.service_id
      AND calendar.agency_key = trips.agency_key
      AND calendar.wednesday = 1

    WHERE (
      trip_start_time BETWEEN counter AND counter + 60
      AND '2017-07-03' BETWEEN calendar.start_date AND calendar.end_date
    )
    AND NOT EXISTS (
      SELECT 0
      FROM gtfs_calendar_dates AS date_exceptions
      WHERE date = '2017-07-03'
      AND date_exceptions.agency_key = trips.agency_key
      AND date_exceptions.service_id = calendar.service_id
      AND exception_type = 2
    );
  END LOOP;
 for r in 
 Select * from tmp_count
 loop
 return next r
 end loop;
END; 

Upvotes: 0

Related Questions