Stephane Maarek
Stephane Maarek

Reputation: 5352

Trying to load XML data into Hive... wrongly interprets line returns

I am using the following query in Hive

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/data-input.xml' OVERWRITE INTO TABLE xmltable;

As it happens, my xml file (which one contains one root element), gets loaded and creates 8 rows instead of the expected one. This is because I think there are lines returns in my file... is there any way to avoid (some workaround), or should I pre-process my files before hand using another tool? (looking for suggestions here)

Thanks!

Upvotes: 1

Views: 125

Answers (1)

Kranach
Kranach

Reputation: 722

Although there is "LINES TERMINATED BY" construct in Hive, it only supports newlines. So no, there is no easy workaround. You either have to preprocess your file, or use UDFs designed to work with xml files (Check the answer to question linked by Stephanie)

Upvotes: 1

Related Questions