Reputation: 6087
I am modeling (in Postgres 9.6.1 / postGIS 2.3.1) a booking system for local services provided by suppliers:
create table supplier (
id serial primary key,
name text not null check (char_length(title) < 280),
type service_type,
duration interval,
...
geo_position geography(POINT,4326)
...
);
Each supplier keeps a calendar with time slots when he/she is available to be booked:
create table timeslot (
id serial primary key,
supplier_id integer not null references supplier(id),
slot tstzrange not null,
constraint supplier_overlapping_timeslot_not_allowed
exclude using gist (supplier_id with =, slot with &&)
);
For when a client wants to know which nearby suppliers are available to book at a certain time, I create a view and a function:
create view supplier_slots as
select
supplier.name, supplier.type, supplier.geo_position, supplier.duration, ...
timeslot.slot
from
supplier, timeslot
where
supplier.id = timeslot.supplier_id;
create function find_suppliers(wantedType service_type, near_latitude text, near_longitude text, at_time timestamptz)
returns setof supplier_slots as $$
declare
nearpoint geography;
begin
nearpoint := ST_GeographyFromText('SRID=4326;POINT(' || near_latitude || ' ' || near_longitude || ')');
return query
select * from supplier_slots
where type = wantedType
and tstzrange(at_time, at_time + duration) <@ slot
order by ST_Distance( nearpoint, geo_position )
limit 100;
end;
$$ language plpgsql;
All this works really well.
Now, for the suppliers that did NOT have a bookable time slot at the requested time, I would like to find their closest available timeslots, before and after the requested at_time
, also sorted by distance.
This has my mind spinning a little bit and I can't find any suitable operators to give me the nearest tsrange.
Any ideas on the smartest way to do this?
Upvotes: 2
Views: 1220
Reputation: 656992
The solution depends on the exact definition of what you want.
I suggest these slightly adapted table definitions to make the task simpler, enforce integrity and improve performance:
CREATE TABLE supplier (
supplier_id serial PRIMARY KEY,
supplier text NOT NULL CHECK (length(title) < 280),
type service_type,
duration interval,
geo_position geography(POINT,4326)
);
CREATE TABLE timeslot (
timeslot_id serial PRIMARY KEY,
supplier_id integer NOT NULL -- references supplier(id),
slot_a timestamptz NOT NULL,
slot_z timestamptz NOT NULL,
CONSTRAINT timeslot_range_valid CHECK (slot_a < slot_z)
CONSTRAINT timeslot_no_overlapping
EXCLUDE USING gist (supplier_id WITH =, tstzrange(slot_a, slot_z) WITH &&)
);
CREATE INDEX timeslot_slot_z ON timeslot (supplier_id, slot_z);
CREATE INDEX supplier_geo_position_gist ON supplier USING gist (geo_position);
Save two timestamptz
columns slot_a
and slot_z
instead of the tstzrange
column slot
- and adapt constraints accordingly. This treats all ranges as default inclusive lower and exclusive upper bounds automatically now - which avoids corner case errors / headache.
Collateral benefit: only 16 bytes for 2 timestamptz
instead of 25 bytes (32 with padding) for the tstzrange
.
All queries you might have had on slot
keep working with tstzrange(slot_a, slot_z)
as drop-in replacement.
Add an index on (supplier_id, slot_z)
for the query at hand.
And a spatial index on supplier.geo_position
(which you probably have already).
Depending on data distribution in type
, a couple of partial indexes for types common in queries might help performance:
CREATE INDEX supplier_geo_type_foo_gist ON supplier USING gist (geo_position)
WHERE supplier = 'foo'::service_type;
This query finds the X closest suppliers who offer the correct service_type
(100 in the example), each with the one closest matching time slot (defined by the time distance to the start of the slot). I combined this with actually matching slots, which may or may not be what you need.
CREATE FUNCTION f_suppliers_nearby(_type service_type, _lat text, _lon text, at_time timestamptz)
RETURNS TABLE (supplier_id int
, name text
, duration interval
, geo_position geography(POINT,4326)
, distance float
, timeslot_id int
, slot_a timestamptz
, slot_z timestamptz
, time_dist interval
) AS
$func$
WITH sup_nearby AS ( -- find matching or later slot
SELECT s.id, s.name, s.duration, s.geo_position
, ST_Distance(ST_GeographyFromText('SRID=4326;POINT(' || _lat || ' ' || _lon || ')')
, geo_position) AS distance
, t.timeslot_id, t.slot_a, t.slot_z
, CASE WHEN t.slot_a IS NOT NULL
THEN GREATEST(t.slot_a - at_time, interval '0') END AS time_dist
FROM supplier s
LEFT JOIN LATERAL (
SELECT *
FROM timeslot
WHERE supplier_id = supplier_id
AND slot_z > at_time + s.duration -- excl. upper bound
ORDER BY slot_z
LIMIT 1
) t ON true
WHERE s.type = _type
ORDER BY s.distance
LIMIT 100
)
SELECT *
FROM (
SELECT DISTINCT ON (supplier_id) * -- 1 slot per supplier
FROM (
TABLE sup_nearby -- matching or later slot
UNION ALL -- earlier slot
SELECT s.id, s.name, s.duration, s.geo_position
, s.distance
, t.timeslot_id, t.slot_a, t.slot_z
, GREATEST(at_time - t.slot_a, interval '0') AS time_dist
FROM sup_nearby s
CROSS JOIN LATERAL ( -- this time CROSS JOIN!
SELECT *
FROM timeslot
WHERE supplier_id = s.supplier_id
AND slot_z <= at_time -- excl. upper bound
ORDER BY slot_z DESC
LIMIT 1
) t
WHERE s.time_dist IS DISTINCT FROM interval '0' -- exact matches are done
) sub
ORDER BY supplier_id, time_dist -- pick temporally closest slot per supplier
) sub
ORDER BY time_dist, distance; -- matches first, ordered by distance; then misses, ordered by time distance
$func$ LANGUAGE sql;
I did not use your view supplier_slots
and optimized for performance instead. The view may still be convenient. You might include tstzrange(slot_a, slot_z) AS slot
for backward compatibility.
The basic query to find the 100 closest suppliers is a textbook "K Nearest Neighbour" problem. A GiST index works well for this. Related:
The additional task (find the temporally nearest slot) can be split in two tasks: to find the next higher and the next lower row. The core feature of the solution is to have two subqueries with ORDER BY slot_z LIMIT 1
and ORDER BY slot_z DESC LIMIT 1
, which result in two very fast index scans.
I combined the first one with finding actual matches, which is a (smart, I think) optimization, but may distract from the actual solution.
Upvotes: 2