Ivan
Ivan

Reputation: 703

How to explode an Array and create a view in Hive?

I have the following data where id is an Integer and vectors is an array:

id, vectors
1, [1,2,3]
2, [2,3,4]
3, [3,4,5]

I would like to explode the vectors column with its index postioning such that it looks like this:

+---+-----+------+
|id |index|vector|
+---+-----+------+
|1  |0    |1     |
|1  |1    |2     |
|1  |2    |3     |
|2  |0    |2     |
|2  |1    |3     |
|2  |2    |4     |
|3  |0    |3     |
|3  |1    |4     |
|3  |2    |5     |
+---+-----+------+

I figured that I can do this using Spark Scala using selectExpr

df.selectExpr("*", "posexplode(vectors) as (index, vector)")

However, this is a relatively simple task and I would like to avoid writing ETL scripts and was thinking if there is anyway the expression can be used and creating a view for easy access through Presto.

Upvotes: 0

Views: 1395

Answers (2)

DatabaseCoder
DatabaseCoder

Reputation: 2032

You can use Lateral view of Hive to explode array data. Try below query -

select
    id, (row_number() over (partition by id order by col)) -1 as `index`, col as vector
from (
    select 1 as id, array(1,2,3) as vectors from (select '1') t1 union all
    select 2 as id, array(2,3,4) as vectors from (select '1') t2 union all
    select 3 as id, array(3,4,5) as vectors from (select '1') t3
  ) t
LATERAL VIEW explode(vectors) v;

Upvotes: 0

Martin Traverso
Martin Traverso

Reputation: 5316

This is easy to do in Presto using standard SQL syntax with UNNEST:

WITH data(id, vector) AS (
    VALUES
    (1, array[1,2,3]),
    (2, array[2,3,4]),
    (3, array[3,4,5])
)
SELECT id, index - 1 AS index, value
FROM data, UNNEST(vector) WITH ORDINALITY AS t(value, index)

Note that the index produced by WITH ORDINALITY is 1-based, so I subtracted 1 from it to produce the output you included in your question.

Upvotes: 4

Related Questions