GRS
GRS

Reputation: 3094

SQL: Joining 2 tables on first matching row condition

I have two tables that are correctly

user journeys

id  timestamp     bus 
1       00:10      12
1       16:10      12
2       14:00      23

bus

id   timestamp    price
12   00:00        1.3
12   00:10        1.5
12   00:20        1.7
12   18:00        2.0
13   00:00        3.0

My goal is to find how much each user spent on travel today.

In our case, the user took bus number 12 at 00:10 and paid 1.5, and another one at 16:10 where the price increased to 1.7. In total, this person paid 3.2 today. We always take the latest updated price.

I've done this using a massive subquery and it looks inefficient. Does anyone have a slick solution?


Sample Data For Reproduction:


Please see http://sqlfiddle.com/#!17/10ad6/2

Or Build Schema:

drop table if exists journeys;
create table journeys(
id numeric,
timestamp timestamp without time zone,
bus numeric
);

truncate table journeys;
insert into journeys
values
(1, '2018-08-22 00:10:00', 12),
(1, '2018-08-22 16:10:00', 12),
(2, '2018-08-22 14:00:00', 23);

-- Bus Prices

drop table if exists bus;
create table bus (
bus_id int,
timestamp timestamp without time zone,
price numeric
);

truncate table bus;
insert into bus
values

(12, '2018-08-22 00:10:00', 1.3),
(12, '2018-08-22 00:10:00', 1.5),
(12, '2018-08-22 00:20:00', 1.7),
(12, '2018-08-22 18:00:00', 2.0),
(13, '2018-08-22 00:00:00', 3.0);

Upvotes: 1

Views: 56

Answers (2)

Vishnu Kunchur
Vishnu Kunchur

Reputation: 1726

You may also do this using an inner join and windowing functions:

SELECT user_id, SUM(price)
FROM
(
SELECT user_id, journey_timestamp, bus_id, price_change_timestamp, 
        COALESCE(LEAD(price_change_timestamp) OVER(PARTITION BY bus_id ORDER BY price_change_timestamp), CAST('2100-01-01 00:00:00' AS TIMESTAMP)) AS next_price_timestamp, price
   FROM
    (
        SELECT a.id AS user_id, a.timestamp AS journey_timestamp, a.bus AS bus_id, b.timestamp AS price_change_timestamp, b.price
        FROM journeys a
        INNER JOIN bus b
        ON a.bus = b.bus_id
    ) a1
) a2
WHERE journey_timestamp >= price_change_timestamp AND journey_timestamp < next_price_timestamp
GROUP BY user_id

This is essentially what is happening:

1) The inner query joins the tables, ensuring that each journey transaction is matched to all price fares the bus has had at all points of time.

2) The LEAD function partitions by bus_id ordered by the times when the bus fares changed, to create a "window" for which that fare is valid. The COALESCE hack is to work around the NULLs that are generated in the process.

3) We filter by those rows where the journey timestamp lies within the "window", and find the fares for each user with a groupby.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270773

I don't know that this is faster than your solution (which you don't show). A correlated subquery seems like a reasonable solution.

But another method is:

SELECT j.*, b.price
FROM journeys j LEFT JOIN
     (SELECT b.*, LEAD(timestamp) OVER (PARTITION BY bus_id ORDER BY timestamp) as next_timestamp
      FROM bus b
     ) b
     ON b.bus_id = j.bus AND
        j.timestamp >= b.timestamp AND
        (j.timestamp < b.next_timestamp OR b.next_timestamp IS NULL);

Upvotes: 2

Related Questions