Reputation: 12512
I have this table.
+------------------------------------------------------------+
| ks | time | val1 | val2 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 1 |
| B | 1 | 3 | 5 |
| A | 2 | 6 | 7 |
| B | 2 | 10 | 12 |
| A | 4 | 6 | 7 |
| B | 4 | 20 | 26 |
+------------------------------------------------------------+
What I want to get is for each row,
ks | time | val1 | val1 of next ts of same ks |
To be clear, result of above example should be,
+------------------------------------------------------------+
| ks | time | val1 | next.val1 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 6 |
| B | 1 | 3 | 10 |
| A | 2 | 6 | 6 |
| B | 2 | 10 | 20 |
| A | 4 | 6 | null |
| B | 4 | 20 | null |
+------------------------------------------------------------+
(I need the same next for value2 as well)
I tried a lot to come up with a hive query for this, but still no luck. I was able to write a query for this in sql as mentioned here (Quassnoi's answer), but couldn't create the equivalent in hive because hive doesn't support subqueries in select.
Can someone please help me achieve this?
Thanks in advance.
EDIT:
Query I tried was,
SELECT ks, time, val1, next[0] as next.val1 from
(SELECT ks, time, val1
COALESCE(
(
SELECT Val1, time
FROM myTable mi
WHERE mi.val1 > m.val1 AND mi.ks = m.ks
ORDER BY time
LIMIT 1
), CAST(0 AS BIGINT)) AS next
FROM myTable m
ORDER BY time) t2;
Upvotes: 2
Views: 4677
Reputation: 6443
I find that using Hive custom map/reduce functionality works great to solve queries similar to this. It gives you the opportunity to consider a set of input and "reduce" to one (or more) results.
This answer discusses the solution.
The key is that you use CLUSTER BY
to send all results with similar key value to the same reducer, hence same reduce script, collect accordingly, and then output the reduced results when the key changes, and start collecting for the new key.
Upvotes: 2
Reputation: 5940
Your query seems quite similar to the "year ago" reporting that is ubiquitous in financial reporting. I think a LEFT OUTER JOIN
is what you are looking for.
We join table myTable
to itself, naming the two instances of the same table m
and n
. For every entry in the first table m
we will attempt to find a matching record in n
with the same ks
value but an incremented value of time
. If this record does not exist, all column values for n
will be NULL
.
SELECT
m.ks,
m.time,
m.val1,
n.val1 as next_val1,
m.val2,
n.val2 as next_val2
FROM
myTable m
LEFT OUTER JOIN
myTable n
ON (
m.ks = n.ks
AND
m.time + 1 = n.time
);
Returns the following.
ks time val1 next_val1 val2 next_val2
A 1 1 6 1 7
A 2 6 6 7 7
A 3 6 NULL 7 NULL
B 1 3 10 5 12
B 2 10 20 12 26
B 3 20 NULL 26 NULL
Hope that helps.
Upvotes: 2