Reputation: 41
My data is structured like in the below table:
| Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
--------------------------------------------------------
| abcd | 16 | 32 | 14 | 52 | 41 | 17 |
| ... | ... | ... | ... | ... | ... | ... |
I am looking to query the data in Hive in a way such that it looks like this:
| Name | Class | FooVal | BarVal |
----------------------------------
| abcd | A | 16 | 52 |
| abcd | B | 32 | 41 |
| abcd | C | 14 | 17 |
| ... | ... | ... | ... |
I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?
Upvotes: 1
Views: 2123
Reputation: 26
Hive unpivot multiple columns:
select
t1.ID,
lv.key as class,
lv.FooStr.col1 as FooVal,
lv.FooStr.col2 as BarVal
from
table t1
LATERAL VIEW explode (
map(
'A', named_struct('col1', Foo_A, 'col2', Bar_A),
'B', named_struct('col1', Foo_B, 'col2', Bar_B),
'C', named_struct('col1', Foo_C, 'col2', Bar_C)
)) lv as key, FooStr
where
coalesce(lv.FooStr.col1, lv.FooStr.col2) IS NOT NULL
Upvotes: 0
Reputation: 41
Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.
select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
from table t1
LATERAL VIEW explode (map(
'A', Foo_A,
'B', Foo_B,
'C', Foo_C
)) t2 as key_1, FooVal
LATERAL VIEW explode (map(
'A', Bar_A,
'B', Bar_B,
'C', Bar_C
)) t3 as key_2, BarVal
where t2.key_1 = t3.key_2;
Upvotes: 2
Reputation: 38325
CROSS JOIN with class
stack
(see code example) will multiply main table rows x3, one row per class
, then use case
statements to derive your columns depending on class
value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.
set hive.auto.convert.join=true; --this enables map-join
select t.Name,
s.class,
case s.class when 'A' then t.Foo_A
when 'B' then t.foo_B
when 'C' then t.foo_C
end as FooVal,
case s.class when 'A' then t.Bar_A
when 'B' then t.Bar_B
when 'C' then t.Bar_C
end as BarVal
from table t
cross join (select stack(3,'A','B','C') as class) s
;
It will scan the table only once and perform much better than UNION ALL approach.
Upvotes: 2