RKR
RKR

Reputation: 107

In Hive, how to explode child-tags under identical parent-tags present within an XML?

In below Hive-query, I need to map child-tags under parent-tags with same value from the XML content. As of now, cross join is happening since the parent-tag value "ABCD" repeats here.

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;

Expected output from query:

Name    Value
ABCD    111
ABCD    
ABCD       
ABCD    444
ABCD    555

Upvotes: 1

Views: 166

Answers (1)

leftjoin
leftjoin

Reputation: 38335

You can use posexplode() instead of explode() to get position in addition to Name. Then filter array by position in second XPATH, maybe you do not need Name filter in this case, debug it on bigger dataset. I used both: Name and index filters, it works for your data example. Positions in XPATH started from 1 and in Hive posexplode position started from 0, this is why pos+1 is used:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, pos+1 as pos, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer posexplode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  pos, Name
       lateral view outer explode(XPATH(xmlinfo, concat('((ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;

Result:

name    pos value
ABCD    1   111
ABCD    1   
ABCD    2   
ABCD    2   444
ABCD    2   555

Upvotes: 1

Related Questions