Reputation: 467
I am struggling to optimize my query where I need to combine orders data and events data using timestamp so that I attribute certain type of the event to the order in this way that the closest event (table A) will be attributed to order (table B).
Examples: Table A:
Table B:
Currently, I am doing it using cross join but it is not performant at all:
WITH ClosestMatch AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY b.Order ORDER BY UNIX_TIMESTAMP(b.Timestamp) - UNIX_TIMESTAMP(a.Timestamp)) AS Rank
FROM
TableB b
CROSS JOIN
TableA a
WHERE b.Timestamp <= a.Timestamp
)
SELECT
Order,
B_Timestamp,
A_Timestamp,
Event
FROM
ClosestMatch
WHERE
Rank = 1
ORDER BY
B_Timestamp;
Do you have any ideas how to improve it?
Upvotes: 0
Views: 39
Reputation: 41
drawing inspiration from Guru Stron's answer, you can add the LEAD window function to your event table--assuming events have unique timestamps. then set the join condition so that the timestamps in your order table fall between the timestamps in your events table.
WITH
table_a ("timestamp", "event") AS (VALUES
(TIMESTAMP '2024-01-01 10:00', 'Red'),
(TIMESTAMP '2024-01-01 12:00', 'Blue'),
(TIMESTAMP '2024-01-01 14:00', 'Pink')
),
table_b ("timestamp", "order") AS (VALUES
(TIMESTAMP '2024-01-01 11:00', 'Cat'),
(TIMESTAMP '2024-01-01 11:40', 'Dog'),
(TIMESTAMP '2024-01-02 09:10', 'Bird')
),
table_a_with_ranges AS (
SELECT
*,
LEAD("timestamp") OVER(ORDER BY "timestamp") AS next_timestamp
FROM
table_a
)
SELECT
b."order",
b."timestamp" AS b_timestamp,
a."timestamp" AS a_timestamp,
a.event
FROM
table_b AS b
JOIN table_a_with_ranges AS a
ON CASE
WHEN a.next_timestamp IS NOT NULL THEN b."timestamp" BETWEEN a."timestamp" AND a.next_timestamp
ELSE b."timestamp" >= a."timestamp"
END
Upvotes: 0
Reputation: 142833
There is not much you can do I think. There is one dirty hack you can try though. You can "linearize" your two tables into one, order it by the "shared" timestamp and then do some windows functions magic:
-- sample data
WITH table_a(Timestamp, Event) as (
values (timestamp '2024-01-01 10:00', 'Red'),
(timestamp '2024-01-01 12:00', 'Blue'),
(timestamp '2024-01-01 14:00', 'Pink')
),
table_b(Timestamp, "Order") as (
values (timestamp '2024-01-01 11:00', 'Cat'),
(timestamp '2024-01-01 11:40', 'Dog'),
(timestamp '2024-01-02 09:10', 'Bird')
),
-- query parts
merged AS (
select 'event' type,
null "order",
table_a.*
from table_a
union
select 'order' type,
table_b."Order",
table_b.Timestamp,
null event
from table_b
)
select "order",
timestamp,
event,
event_time
from(
SELECT
type,
"order",
timestamp,
lead(event) IGNORE NULLS over (order by timestamp) event,
lead(if(type='event', timestamp)) IGNORE NULLS over (order by timestamp) event_time
FROM
merged
)
where type = 'order'
Output (does not match the one from you query, which does not match your desired output as I stated in the comments, so I made a bit of a guess what you need):
order | timestamp | event | event_time |
---|---|---|---|
Cat | 2024-01-01 11:00:00.000 | Blue | 2024-01-01 12:00:00.000 |
Dog | 2024-01-01 11:40:00.000 | Blue | 2024-01-01 12:00:00.000 |
Bird | 2024-01-02 09:10:00.000 |
You can use lag
in the similar way to lead
to get the closest event from "other side" (i.e. before) if needed.
Upvotes: 0