user2518751
user2518751

Reputation: 735

How do I set the SerDe XML schema correctly?

I've got this XML:

  <AssetCrossReferences Ordered="false">
    <AssetCrossReference AssetID="F7961393-01" Type="Primary Image"/>
    <AssetCrossReference AssetID="M0504-01" Type="Vendor Logo"/>
    <AssetCrossReference AssetID="F7961393-02" Type="Colour Photograph"/>
 </AssetCrossReferences><Specification Ordered="true">

I want the end result to look like this:

AssetID:F7961393-01, Type:Primary Image
AssetID:M0504-01, Type:Vendor Logo
AssetID:F7961393-02, Type:Colour Photograph

How do I do that?

Upvotes: 0

Views: 119

Answers (1)

nobody
nobody

Reputation: 11080

Use a Struct

create external table test 
(
   asset STRUCT<AssetID:STRING,Type:STRING>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties 
(
  "column.xpath.asset"="/AssetCrossReferences/AssetCrossReference"
)
stored as inputformat "com.ibm.spss.hive.serde2.xml.XmlInputFormat"
outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
location "file:///yourfilepath" 
tblproperties 
(
  "xmlinput.start"="<AssetCrossReferences",
  "xmlinput.end"="</AssetCrossReferences>"
);

Then

select * from test;

Upvotes: 1

Related Questions