Reputation: 107
How could the tag definitions be considered while parsing XMLs in Hive query? Below is my XML to be parsed:
<Parents>
<Parent>
<emp id="A12345">
<acc name="perm_ID">44444</acc>
<acc name="temp_ID">55555</acc>
</emp>
</Parent>
</Parents>
There are 3 fields EmpID, PermID and TempID. Expected output is:
EmpID PermID TempID
A12345 44444 55555
Upvotes: 1
Views: 380
Reputation: 38335
XPATH()
returns list of nodes, XPATH_STRING()
returns scalar string. If you have many emp in XML array, use XPATH()
to get a list of Id attributes + lateral view explode
to explode rows, then use Id
to parametrize XPATH_STRING to get elements from corresponding emp.
See this demo(Hive):
with mytable as (
select '<Parents>
<Parent>
<emp id="A12345">
<acc name="perm_ID">44444</acc>
<acc name="temp_ID">55555</acc>
</emp>
</Parent>
</Parents>' as xml
)
select e.Id as EmpId,
XPATH_STRING(t.xml,concat('/Parents/Parent/emp[@id="',e.Id,'"]/acc[@name="perm_ID"]/text()')) permID,
XPATH_STRING(t.xml,concat('/Parents/Parent/emp[@id="',e.Id,'"]/acc[@name="temp_ID"]/text()')) tempID
from mytable t
lateral view outer explode(XPATH(t.xml, '/Parents/Parent/emp/@id')) e as Id
Result:
empid permid tempid
A12345 44444 55555
In this demo you can see how to deal with arrays and scalar elements and attributes, how to parametrize your xpath using values extracted by other xpath.
See also other Hive XPATH functions: XPathUDF
Upvotes: 1