Reputation: 459
I am looking to retrieve the first row and last row over a window in HiveQL.
I know there are a couple ways to do this:
SELECT customer,
FIRST_VALUE(product) over (W),
FIRST_VALUE(time) over (W),
LAST_VALUE(product) over (W),
LAST_VALUE(time) over (W)
FROM table
WINDOW W AS (PARTITION BY customer ORDER BY COST)
WITH table_wRN AS
(
SELECT *,
row_number() over (partition by customer order by cost ASC) rn_B,
row_number() over (partition by customer order by cost DESC) rn_E
FROM table
),
table_first_last AS
(
SELECT *
FROM table_wRN
WHERE (rn_E=1 OR rn_B=1)
)
SELECT table_first.customer,
table_first.product, table_first.time,
table_last.product, table_last.time
FROM table_first_last as table_first WHERE table_first_last.rn_B=1
JOIN table_first_last as table_last WHERE table_first_last.rn_E=1
ON table_first.customer = table_last.customer
My questions:
FIRST_VALUE(product,time) OVER (W) AS product_first, time_first
Thank you!
Upvotes: 1
Views: 3213
Reputation: 1269743
I am almost certain that the first would be more efficient. I mean two window functions versus two window functions, filtering and two joins?
Once you multiply the number of columns, then there might be an issue of which is faster. That said, look at the execution plan. I would expect that all window functions using the same window frame specification would use the same "windows" processing, with just tweaks for each value.
Hive does not have very good support for complex data types such as strings and arrays. In databases that do, it is easy enough to provide a complex type.
Upvotes: 1