Reputation: 355
I have datastructure that looks like:
ID purchase_name purchase_id
A001 {A: a} {One: 1}
A002 {A: a, B: b} {One: 1, Two, 2}
A003 {C: a} {Three: 1}
Think of it this way: {A, B, C...} are the brand name (i.e. Nike, Adidas...); {a, b, c...} are the categories under the brand (shirts, shorts, shoes...); (One, Two, Three...} are the internal id for the brand names and (1, 2, 3...) are the internal id for the category names. Now I would like to explode it to:
ID Brand_name Brand_id Category_name Category_id
A001 A One a 1
A002 A One a 1
A002 B Two b 2
A003 C Three a 1
If there is just one map column to explode, e.g. purchase_name, I am able to use explode to generate brand_name column and category_name column. However, I am trying to use posexplode to get a position variable in order to eliminate multiple rows that would be generated for A001, but I found out that posexplode is only applicable to arrays, not map datatype.
Upvotes: 1
Views: 197
Reputation: 1
First,I suppose your purchase_name
,purchase_id
data type is string ,not map,then we need to divide this question into three tasks:
Remove irrelevant characters, such as '{', '}', and '"'
explode array into rows using posexplode
split the map by ":"
data preparation
drop table temp_table;
CREATE TABLE temp_table (
ID STRING,
purchase_name string,
purchase_id string
);
INSERT INTO temp_table VALUES
('A001', '{A: a}','{One: 1}'),
('A002', '{A: a, B: b}','{One: 1, Two: 2}'),
('A003', '{C: a}','{Three: 1}');
sql:
SELECT id
,split(element, ':') [0] Brand_name
,split(element2, ':') [0] Brand_id
,split(element, ':') [1] Category_name
,split(element2, ':') [1] Category_id
FROM (
SELECT id
,pos.pos
,element
,pos2.pos2
,element2
FROM temp_table LATERAL VIEW posexplode(split(replace(replace(purchase_name, '{', ''), '}', ''), ',')) pos AS pos
,element LATERAL VIEW posexplode(split(replace(replace(purchase_id, '{', ''), '}', ''), ',')) pos2 AS pos2
,element2
) t1
WHERE pos = pos2
output:
id brand_name brand_id category_name category_id
A001 A One a 1
A002 A One a 1
A002 B Two b 2
A003 C Three a 1
Upvotes: 0