RKR
RKR

Reputation: 107

In Hive, how to get the values from within tag definitions in an XML?

How could the tag definitions be considered while parsing XMLs in Hive query? Below is my XML to be parsed:

<Parents>
    <Parent>
      <emp id="A12345">
        <acc name="perm_ID">44444</acc>
        <acc name="temp_ID">55555</acc>
      </emp>
    </Parent>
</Parents>

There are 3 fields EmpID, PermID and TempID. Expected output is:

EmpID   PermID  TempID
A12345  44444   55555

Upvotes: 1

Views: 380

Answers (1)

leftjoin
leftjoin

Reputation: 38335

XPATH() returns list of nodes, XPATH_STRING() returns scalar string. If you have many emp in XML array, use XPATH() to get a list of Id attributes + lateral view explode to explode rows, then use Id to parametrize XPATH_STRING to get elements from corresponding emp.

See this demo(Hive):

with mytable as (
select '<Parents>
    <Parent>
      <emp id="A12345">
        <acc name="perm_ID">44444</acc>
        <acc name="temp_ID">55555</acc>
      </emp>
    </Parent>
</Parents>' as xml
)

select e.Id as EmpId, 
      XPATH_STRING(t.xml,concat('/Parents/Parent/emp[@id="',e.Id,'"]/acc[@name="perm_ID"]/text()')) permID,
      XPATH_STRING(t.xml,concat('/Parents/Parent/emp[@id="',e.Id,'"]/acc[@name="temp_ID"]/text()')) tempID
  from mytable t
       lateral view outer explode(XPATH(t.xml, '/Parents/Parent/emp/@id')) e as  Id

Result:

empid   permid    tempid    
A12345  44444     55555

In this demo you can see how to deal with arrays and scalar elements and attributes, how to parametrize your xpath using values extracted by other xpath.

See also other Hive XPATH functions: XPathUDF

Upvotes: 1

Related Questions