Reputation: 817
I have an xml that needs to be extracted in hive. I am using hive serde to do this. The requirement is to have the xml in one column stored as string. However when I do this the attributes are reversed because xpath populates from the bottom-up. I am trying to get it to show up exactly as the xml would appear. It seems hive automatically alphabetizes the attributes.
Input:
<example>
<context>
<field1 b_attribute ="first" a_attribute1 ="second" ></field1>
</context>
</example>
What I am getting now:
<example>
<context>
<field1 a_attribute1 ="second" b_attribute ="first" ></field1>
</context>
</example>
Expected Output:
<example>
<context>
<field1 b_attribute ="first" a_attribute1 ="second" ></field1>
</context>
</example>
Hive Serde Creation:
create external table EXAMPLE (
example_xml string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.example_xml"="reverse(/context/*)"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'mypathinhdfs'
TBLPROPERTIES (
"xmlinput.start"="<example>",
"xmlinput.end"="</example>"
);
Upvotes: 0
Views: 294
Reputation: 44951
I don't get the issue.
hive> create external table EXAMPLE (
> example_xml string
> )
> ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
> WITH SERDEPROPERTIES (
> "column.xpath.example_xml"="/"
> )
> STORED AS
> INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
> LOCATION '/user/hive/warehouse/example'
> TBLPROPERTIES (
> "xmlinput.start"="<example>",
> "xmlinput.end"="</example>"
> );
OK
Time taken: 0.186 seconds
hive> select * from EXAMPLE;
OK
example.example_xml
<example><context><field1 attribute="first" attribute1="second"/></context></example>
Upvotes: 1