Reputation: 83
So I have catalogued data in an S3 bucket which very similarly resembles this data here:
+-----+-----------+---------+-----------------------+
| id | title | event | time |
+-----+-----------+---------+-----------------------+
|1 | book A | BORROW | 2018-07-01 09:00:00 |
|1 | book A | RETURN | 2018-08-01 09:00:00 |
|2 | book B | BORROW | 2018-08-01 13:00:00 |
|2 | book B | RETURN | 2018-10-01 17:00:00 |
|1 | book A | BORROW | 2018-11-01 09:00:00 |
|1 | book A | RETURN | 2018-12-01 09:00:00 |
+-----+-----------+---------------------------------+
I basically want to be able to write a SELECT statement in Amazon Athena which displays the borrow and return times next to one another in a row like this:
+-----+-----------+-----------------------+-----------------------+
| id | title | borrow_time | return_time |
+-----+-----------+-----------------------+-----------------------+
|1 | book A | 2018-07-01 09:00:00 | 2018-08-01 09:00:00 |
|2 | book B | 2018-08-01 13:00:00 | 2018-10-01 17:00:00 |
|1 | book A | 2018-11-01 09:00:00 | 2018-12-01 09:00:00 |
+-----+-----------+-----------------------+-----------------------+
I've been spending heaps of time writing around 5 different queries (using things like OUTER APPLY
, but Athena seems to be really sensitive to work with, especially considering that it doesn't have any functionality with OUTER APPLY
. This is the logic for my latest statement:
SELECT b.id,
b.title,
b.time AS borrow_time,
MIN(r.time) AS return_time
FROM (
SELECT id,
title,
time
FROM books
WHERE event = 'BORROW'
) b
OUTER JOIN (
SELECT id,
time
FROM books
WHERE event = 'RETURN'
) r
ON b.id = r.id
AND b.time < r.time
GROUP BY b.id,
b.title,
borrow_time
ORDER BY borrow_time;
Any ideas to get around this would be greatly appreciated!
Upvotes: 2
Views: 7186
Reputation: 37473
Try with CASE WHEN and row_number() function :
with pcte as
(
SELECT id,
title,event,time, row_number() over(order by id,title,event) as rn
FROM books
)
SELECT id,
title,
case when event = 'BORROW' then b.time end AS borrow_time,
case when event = 'RETURN' then b.time end AS return_time
FROM pcte order by id, title, rn
Upvotes: 0
Reputation: 1269503
Assuming the borrows and returns are all paired, you can enumerate them and then use conditional aggregation:
select id, title,
max(case when event = 'BORROW' then b.time end) as borrow_time,
max(case when event = 'RETURN' then b.time end) as return_time
from (select b.*,
row_number() over (partition by b.id, b.event order by b.time) as sequm
from books b
) b
group by id, title, seqnum
order by id, title, seqnum;
Upvotes: 1