Reputation: 3094
I have two tables that are correctly
user journeys
id timestamp bus
1 00:10 12
1 16:10 12
2 14:00 23
bus
id timestamp price
12 00:00 1.3
12 00:10 1.5
12 00:20 1.7
12 18:00 2.0
13 00:00 3.0
My goal is to find how much each user spent on travel today.
In our case, the user took bus number 12 at 00:10 and paid 1.5, and another one at 16:10 where the price increased to 1.7. In total, this person paid 3.2 today. We always take the latest updated price.
I've done this using a massive subquery and it looks inefficient. Does anyone have a slick solution?
Please see http://sqlfiddle.com/#!17/10ad6/2
Or Build Schema:
drop table if exists journeys;
create table journeys(
id numeric,
timestamp timestamp without time zone,
bus numeric
);
truncate table journeys;
insert into journeys
values
(1, '2018-08-22 00:10:00', 12),
(1, '2018-08-22 16:10:00', 12),
(2, '2018-08-22 14:00:00', 23);
-- Bus Prices
drop table if exists bus;
create table bus (
bus_id int,
timestamp timestamp without time zone,
price numeric
);
truncate table bus;
insert into bus
values
(12, '2018-08-22 00:10:00', 1.3),
(12, '2018-08-22 00:10:00', 1.5),
(12, '2018-08-22 00:20:00', 1.7),
(12, '2018-08-22 18:00:00', 2.0),
(13, '2018-08-22 00:00:00', 3.0);
Upvotes: 1
Views: 56
Reputation: 1726
You may also do this using an inner join and windowing functions:
SELECT user_id, SUM(price)
FROM
(
SELECT user_id, journey_timestamp, bus_id, price_change_timestamp,
COALESCE(LEAD(price_change_timestamp) OVER(PARTITION BY bus_id ORDER BY price_change_timestamp), CAST('2100-01-01 00:00:00' AS TIMESTAMP)) AS next_price_timestamp, price
FROM
(
SELECT a.id AS user_id, a.timestamp AS journey_timestamp, a.bus AS bus_id, b.timestamp AS price_change_timestamp, b.price
FROM journeys a
INNER JOIN bus b
ON a.bus = b.bus_id
) a1
) a2
WHERE journey_timestamp >= price_change_timestamp AND journey_timestamp < next_price_timestamp
GROUP BY user_id
This is essentially what is happening:
1) The inner query joins the tables, ensuring that each journey transaction is matched to all price fares the bus has had at all points of time.
2) The LEAD function partitions by bus_id ordered by the times when the bus fares changed, to create a "window" for which that fare is valid. The COALESCE hack is to work around the NULLs that are generated in the process.
3) We filter by those rows where the journey timestamp lies within the "window", and find the fares for each user with a groupby.
Upvotes: 2
Reputation: 1270773
I don't know that this is faster than your solution (which you don't show). A correlated subquery seems like a reasonable solution.
But another method is:
SELECT j.*, b.price
FROM journeys j LEFT JOIN
(SELECT b.*, LEAD(timestamp) OVER (PARTITION BY bus_id ORDER BY timestamp) as next_timestamp
FROM bus b
) b
ON b.bus_id = j.bus AND
j.timestamp >= b.timestamp AND
(j.timestamp < b.next_timestamp OR b.next_timestamp IS NULL);
Upvotes: 2