Sekhar
Sekhar

Reputation: 699

Create multiple columns from existing Hive table columns

How to create multiple columns from an existing hive table. The example data would be like below.

enter image description here

My requirement is to create 2 new columns from existing table only when the condition met. col1 when code=1. col2 when code=2.

expected output:

enter image description here

Please help in how to achieve it in Hive queries?

Upvotes: 1

Views: 427

Answers (1)

leftjoin
leftjoin

Reputation: 38290

If you aggregate values required into arrays, then you can explode and filter only those with matching positions.

Demo:

with 

my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)

select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
       collect_set(case when code=2 then col else null end) as col2 
  from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val  
   lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos

Result:

col1    col2
a       b
a1      b1

This approach will not work if arrays are of different size.

Another approach - calculate row_number and full join on row_number, this will work if col1 and col2 have different number of values (some values will be null):

with 

my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),

ordered as
(
select code, col, row_number() over(partition by code order by col) rn
  from my_table where code in (1,2)
)

select c1.col as col1, c2.col as col2
  from (select * from ordered where code=1) c1 
       full join 
       (select * from ordered where code=2) c2 on c1.rn = c2.rn

Result:

col1    col2
a       b
a1      b1

Upvotes: 1

Related Questions