Reputation: 5352
I am using the following query in Hive
--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/data-input.xml' OVERWRITE INTO TABLE xmltable;
As it happens, my xml file (which one contains one root element), gets loaded and creates 8 rows instead of the expected one. This is because I think there are lines returns in my file... is there any way to avoid (some workaround), or should I pre-process my files before hand using another tool? (looking for suggestions here)
Thanks!
Upvotes: 1
Views: 125
Reputation: 722
Although there is "LINES TERMINATED BY" construct in Hive, it only supports newlines. So no, there is no easy workaround. You either have to preprocess your file, or use UDFs designed to work with xml files (Check the answer to question linked by Stephanie)
Upvotes: 1