Sai
Sai

Reputation: 97

Hive table Array Columns - explode using array_index

Hi i have a Hive table

select a,b,c,d from riskfactor_table 
In the above table B, C and D columns are array columns. Below is my Hive DDL 
Create external table riskfactor_table 
(a string, 
b array<string>, 
c array<double>, 
d array<double> ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '~'  
stored as textfile location 'user/riskfactor/data'; 

Here is my table data:

ID400S,["jms","jndi","jaxb","jaxn"],[100,200,300,400],[1,2,3,4]
ID200N,["one","two","three"],[212,352,418],[6,10,8]

If i want to split array columns how can i split?
If i use explode function i can split array values for only one column

select explode(b) as b from riskfactor_table;

Output:

jms  
jndi  
jaxb  
jxn  
one  
two  
three

But i want all the columns to be populated using one select statement below-

Query - select a,b,c,d from risk_factor;

Output:

row1-  ID400S    jms    100    1  
row2-  ID400S    jndi   200    2  
row3-  ID400S    jaxb    300    3  
row4-  ID400S    jaxn    400    4  

How can i populate all the data?

Upvotes: 1

Views: 16205

Answers (3)

user2359902
user2359902

Reputation: 121

I was also looking for same question's solution. Thanks Jerome, for this Brickhouse solution.

I had to make a slight change (addition of alias "n1 as n") as below to make it work for my case:

hive> describe test;
OK
id              string
animals     array<string>
cnt         array<bigint>

hive> select * from test;
OK
abc     ["cat","dog","elephant","dolphin","snake","parrot","ant","frog","kuala","cricket"]      [10597,2027,1891,1868,1804,1511,1496,1432,1305,1299]

hive> select `id`, array_index(`animals`,n), array_index(`cnt`,n) from test lateral view numeric_range(0,10) n1 as n;
OK
abc     cat             10597
abc     dog             2027
abc     elephant        1891
abc     dolphin         1868
abc     snake           1804
abc     parrot          1511
abc     ant             1496
abc     frog            1432
abc     kuala           1305
abc     cricket         1299

The only thing is I have to know beforehand that there are 10 elements to be exploded.

Upvotes: 1

Jerome Banks
Jerome Banks

Reputation: 1630

Use the 'numeric_range' UDF from Brickhouse. Here is a blog posting describing the details.

https://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/

In your case, your query would be something like

SELECT a, 
       array_index( b, i ),
       array_index( c, i ),
       array_index( d, i )
FROM risk_factor_table
 LATERAL VIEW numeric_range( 0, 3 );

Upvotes: 1

sandeep rawat
sandeep rawat

Reputation: 4957

You can achieve this using LATERAL VIEW

       SELECT Mycoulmna, Mycoulmnb ,Mycoulmnc
                 FROM  riskfactor_table
             LATERAL VIEW explode(a) myTablea AS Mycoulmna
             LATERAL VIEW explode(a) myTableb AS Mycoulmnb
             LATERAL VIEW explode(a) myTablec AS Mycoulmnc ;

for more detail go throw it .

Upvotes: 1

Related Questions