liz
liz

Reputation: 41

How to efficiently unpivot MULTIPLE columns in Hive?

My data is structured like in the below table:

| Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
--------------------------------------------------------
| abcd |    16 |    32 |    14 |    52 |    41 |    17 |
| ...  |   ... |   ... |   ... |   ... |   ... |   ... |

I am looking to query the data in Hive in a way such that it looks like this:

| Name | Class | FooVal | BarVal |
----------------------------------
| abcd | A     |     16 |     52 |
| abcd | B     |     32 |     41 |
| abcd | C     |     14 |     17 |
| ...  | ...   |    ... |    ... |

I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?

Upvotes: 1

Views: 2123

Answers (3)

Artem Popov
Artem Popov

Reputation: 26

Hive unpivot multiple columns:

select
        t1.ID,
        lv.key         as class,
        lv.FooStr.col1 as FooVal,
        lv.FooStr.col2 as BarVal
    from
        table t1
        LATERAL VIEW explode (
            map(
               'A', named_struct('col1', Foo_A, 'col2', Bar_A),
               'B', named_struct('col1', Foo_B, 'col2', Bar_B),
               'C', named_struct('col1', Foo_C, 'col2', Bar_C)
                )) lv as key, FooStr
    where
        coalesce(lv.FooStr.col1, lv.FooStr.col2) IS NOT NULL

Upvotes: 0

liz
liz

Reputation: 41

Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.

    select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
    from table t1
    LATERAL VIEW explode (map(
   'A', Foo_A,
   'B', Foo_B,
   'C', Foo_C
    )) t2 as key_1, FooVal
    LATERAL VIEW explode (map(
    'A', Bar_A,
    'B', Bar_B,
    'C', Bar_C
     )) t3 as key_2, BarVal
     where t2.key_1 = t3.key_2;

Upvotes: 2

leftjoin
leftjoin

Reputation: 38325

CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.

set hive.auto.convert.join=true; --this enables map-join

select t.Name,
       s.class,
       case s.class when 'A' then t.Foo_A 
                    when 'B' then t.foo_B
                    when 'C' then t.foo_C
        end as FooVal,
       case s.class when 'A' then t.Bar_A 
                    when 'B' then t.Bar_B
                    when 'C' then t.Bar_C
        end as BarVal              
 from table t 
      cross join (select stack(3,'A','B','C') as class) s
;

It will scan the table only once and perform much better than UNION ALL approach.

Upvotes: 2

Related Questions