Reputation: 1
I have a hive table with one of the example row:
A, B, [“11”, “12”, “13”], [“1”, “2”] The array lengths of columns 3 and 4 are sometimes the same, and sometimes not.
Output:
A, B, 11, 1
A, B, 12, 2
A, B, 13, [null]
I am new to hive and really need help figuring this out.
The length of arrays keeps changing. It’s not static. sometimes it’s of length 3 or 4 or 5. That’s where i’m stuck.
I used the posexplode function in hive
Upvotes: 0
Views: 30
Reputation: 1
Here, we need to divide it into three tasks:
Remove irrelevant characters, such as '[', ']', and '"'
explode array into rows using posexplode
as you said,it is important to note that holding the array numbers
concat the results from the explode result while keeping any missing values as null,so we need left join
.
the hive sql is below:
WITH data1
AS (
SELECT 'A' col1
,'B' col2
,'["11", "12", "13"]' AS array_str1
,'["1", "2"]' AS array_str2
)
,table1
AS (
SELECT col1
,col2
,pos
,element
FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str1, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
,element
)
,table2
AS (
SELECT col1
,col2
,pos
,element
FROM data1 LATERAL VIEW posexplode(SPLIT(REPLACE(REPLACE(REPLACE(array_str2, '"', ''), '[', ''), ']', ''), ',')) pos AS pos
,element
)
SELECT t1.col1
,t1.col2
,t1.element
,t2.element
FROM table1 t1
LEFT JOIN table2 t2 ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND t1.pos = t2.pos
output:
col1 col2 element element
A B 11 1
A B 12 2
A B 13 NULL
Upvotes: 0