Reputation: 107
In below Hive-query, I need to map child-tags under parent-tags with same value from the XML content. As of now, cross join is happening since the parent-tag value "ABCD" repeats here.
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string/>
<string>444</string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then '' else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
Expected output from query:
Name Value
ABCD 111
ABCD
ABCD
ABCD 444
ABCD 555
Upvotes: 1
Views: 166
Reputation: 38335
You can use posexplode() instead of explode() to get position in addition to Name. Then filter array by position in second XPATH, maybe you do not need Name filter in this case, debug it on bigger dataset. I used both: Name and index filters, it works for your data example. Positions in XPATH started from 1 and in Hive posexplode position started from 0, this is why pos+1 is used:
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string/>
<string>444</string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, pos+1 as pos, case when value='NULL' then '' else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer posexplode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as pos, Name
lateral view outer explode(XPATH(xmlinfo, concat('((ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
Result:
name pos value
ABCD 1 111
ABCD 1
ABCD 2
ABCD 2 444
ABCD 2 555
Upvotes: 1