Defcon
Defcon

Reputation: 817

Hive Serde Xpath Extract

I have an xml that needs to be extracted in hive. I am using hive serde to do this. The requirement is to have the xml in one column stored as string. However when I do this the attributes are reversed because xpath populates from the bottom-up. I am trying to get it to show up exactly as the xml would appear. It seems hive automatically alphabetizes the attributes.

Input:

 <example>
    <context>
         <field1 b_attribute ="first" a_attribute1 ="second" ></field1>
    </context>
 </example>

What I am getting now:

<example>
    <context>
         <field1 a_attribute1 ="second" b_attribute ="first" ></field1>
    </context>
 </example>

Expected Output:

<example>
    <context>
         <field1 b_attribute ="first" a_attribute1 ="second" ></field1>
    </context>
 </example>

Hive Serde Creation:

create external table EXAMPLE (
example_xml string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.example_xml"="reverse(/context/*)"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'mypathinhdfs'
TBLPROPERTIES (
"xmlinput.start"="<example>",
"xmlinput.end"="</example>"
);

Upvotes: 0

Views: 294

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44951

I don't get the issue.

hive> create external table EXAMPLE (
    > example_xml string
    > )
    > ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    > WITH SERDEPROPERTIES (
    > "column.xpath.example_xml"="/"
    > )
    > STORED AS
    > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
    > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    > LOCATION '/user/hive/warehouse/example'
    > TBLPROPERTIES (
    > "xmlinput.start"="<example>",
    > "xmlinput.end"="</example>"
    > );
OK
Time taken: 0.186 seconds
hive> select * from EXAMPLE;
OK
example.example_xml
<example><context><field1 attribute="first" attribute1="second"/></context></example>

Upvotes: 1

Related Questions