ian_chan
ian_chan

Reputation: 355

hive posexplode map datatype

I have datastructure that looks like:

ID       purchase_name        purchase_id
A001    {A: a}                {One: 1}
A002    {A: a, B: b}          {One: 1, Two, 2}
A003    {C: a}                {Three: 1}

Think of it this way: {A, B, C...} are the brand name (i.e. Nike, Adidas...); {a, b, c...} are the categories under the brand (shirts, shorts, shoes...); (One, Two, Three...} are the internal id for the brand names and (1, 2, 3...) are the internal id for the category names. Now I would like to explode it to:

ID      Brand_name Brand_id Category_name Category_id
A001    A          One      a             1
A002    A          One      a             1
A002    B          Two      b             2
A003    C          Three    a             1

If there is just one map column to explode, e.g. purchase_name, I am able to use explode to generate brand_name column and category_name column. However, I am trying to use posexplode to get a position variable in order to eliminate multiple rows that would be generated for A001, but I found out that posexplode is only applicable to arrays, not map datatype.

Upvotes: 1

Views: 197

Answers (1)

Even Zhang
Even Zhang

Reputation: 1

First,I suppose your purchase_name,purchase_id data type is string ,not map,then we need to divide this question into three tasks:

  1. Remove irrelevant characters, such as '{', '}', and '"'

  2. explode array into rows using posexplode

  3. split the map by ":"

data preparation

drop table temp_table;
CREATE TABLE temp_table (
    ID STRING,
    purchase_name string,
    purchase_id string
);
INSERT INTO temp_table VALUES 
('A001', '{A: a}','{One: 1}'),
('A002', '{A: a, B: b}','{One: 1, Two: 2}'),
('A003', '{C: a}','{Three: 1}');

sql:

SELECT id
    ,split(element, ':') [0] Brand_name
    ,split(element2, ':') [0] Brand_id
    ,split(element, ':') [1] Category_name
    ,split(element2, ':') [1] Category_id
FROM (
    SELECT id
        ,pos.pos
        ,element
        ,pos2.pos2
        ,element2
    FROM temp_table LATERAL VIEW posexplode(split(replace(replace(purchase_name, '{', ''), '}', ''), ',')) pos AS pos
        ,element LATERAL VIEW posexplode(split(replace(replace(purchase_id, '{', ''), '}', ''), ',')) pos2 AS pos2
        ,element2
    ) t1
WHERE pos = pos2

output:

id  brand_name  brand_id    category_name   category_id
A001    A   One  a   1
A002    A   One  a   1
A002    B    Two     b   2
A003    C   Three    a   1

Upvotes: 0

Related Questions