Reputation: 107
In below Hive-query, the XML consists of Parents tag with 4 Parent families and 4 ParentArray families within. Under each ParentArray, there are ParentFieldArray occurrences consisting of the same Name and Value tags (ABCD and 111 resp.).
with your_data as (
select '<Parents>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
</Parents>' as xmlinfo
)
select name, pos+1 as pos, value
from your_data d
lateral view outer posexplode(XPATH(xmlinfo, 'Parents/Parent/ParentArray/ParentFieldArray/Name/text()')) pf as pos, Name
lateral view outer explode(XPATH(xmlinfo, concat('Parents/Parent/ParentArray/ParentFieldArray[',pf.pos+1, '][Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
The above query is populating all "111" rows under 1st index itself and NULL values under indices 2, 3 and 4.
Expected output from query:
name pos value
ABCD 1 111
ABCD 2 111
ABCD 3 111
ABCD 4 111
Upvotes: 1
Views: 112
Reputation: 38335
It was a bug in XPATH. [] takes precedence and producing weird results. Use brackets.
with your_data as (
select '<Parents>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
<Parent>
<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
</Value>
</ParentFieldArray>
</ParentArray>
</Parent>
</Parents>' as xmlinfo
)
select pos+1 as pos, Name, Value
from your_data d
lateral view outer posexplode(XPATH(xmlinfo, 'Parents/Parent/ParentArray/ParentFieldArray/Name/text()')) pf as pos, Name
lateral view outer explode(XPATH(xmlinfo, concat('((Parents/Parent/ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value
;
Result:
pos name value
1 ABCD 111
2 ABCD 111
3 ABCD 111
4 ABCD 111
Upvotes: 1