vito kim
vito kim

Reputation: 1

array into multiple rows in hive

I have a hive table with one of the example row:

A, B, [“11”, “12”, “13”], [“1”, “2”] The array lengths of columns 3 and 4 are sometimes the same, and sometimes not.

Output:

A, B, 11, 1

A, B, 12, 2

A, B, 13, [null]

I am new to hive and really need help figuring this out.

The length of arrays keeps changing. It’s not static. sometimes it’s of length 3 or 4 or 5. That’s where i’m stuck.

I used the posexplode function in hive

Upvotes: 0

Views: 30

Answers (1)

Even Zhang
Even Zhang

Reputation: 1

Here, we need to divide it into three tasks:

  1. Remove irrelevant characters, such as '[', ']', and '"'

  2. explode array into rows using posexplode as you said,it is important to note that holding the array numbers

  3. concat the results from the explode result while keeping any missing values as null,so we need left join.

the hive sql is below:

WITH data1
AS (
    SELECT 'A' col1
        ,'B' col2
        ,'["11", "12", "13"]' AS array_str1
        ,'["1", "2"]' AS array_str2
    )
    ,table1
AS (
    SELECT col1
        ,col2
        ,pos
        ,element
    FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str1, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
        ,element
    )
    ,table2
AS (
    SELECT col1
        ,col2
        ,pos
        ,element
    FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str2, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
        ,element
    )
SELECT t1.col1
    ,t1.col2
    ,t1.element
    ,t2.element
FROM table1 t1
LEFT JOIN table2 t2 ON t1.col1 = t2.col1
    AND t1.col2 = t2.col2
    AND t1.pos = t2.pos

output:

col1    col2    element element
    A   B   11  1
    A   B    12  2
    A   B    13 NULL

Upvotes: 0

Related Questions