RKR
RKR

Reputation: 107

In Hive, is there a way to parse multiple occurrences of the same tag within the same XML family?

In my scenario, there's one occurrence of "Name" tag under each parent family. The parent families repeat, and within each family there are multiple "Value" tags. My expectation is to parse through each [Name, Value] pair and get them displayed row-wise.

Sample XML and the expected output are shown below:

<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string>222</string>
            <string>333</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>

SQL query's output should be:

Name         Value
ABCD         111
ABCD         222
ABCD         333
EFGH         444
EFGH         555

Here "ParentFieldArray" families repeat, and number of "Value" tags within them too vary, family-wise.

Query attempted:

select Name, Value from <table_name> -- "xmlinfo" field in this table includes the above XML content
LATERAL VIEW POSEXPLODE(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) NM as Name_pos, Name
LATERAL VIEW POSEXPLODE(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Value/string/text()')) VL as Value_pos, Value;

I tried to query with LATERAL VIEW POSEXPLODE(XPATH(..)) concept, but it doesn't seem to work. Basically I'm unable to map the right Name against each Value, based on their positions. This is resulting in cross join.

Is there any other function/concept within SQL that could fetch output as shown above?

Upvotes: 2

Views: 283

Answers (1)

leftjoin
leftjoin

Reputation: 38335

Get Name and pass it to second XPATH to filter only ParentFieldArray containing that Name.

Demo:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string>222</string>
            <string>333</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select name, value 
  from your_data d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value

Result:

name    value
ABCD    111
ABCD    222
ABCD    333
EFGH    444
EFGH    555

Upvotes: 2

Related Questions