Reputation: 765
I am trying to load a csv with pipe delimiter to an hive external table. The pipe occurring within data fields are enclosed within quotes. Double quotes occurring within data are escaped with \ . When I configure external table, I see data with double quotes are not interpreted properly.
test.csv
id|name
105|"Test | pipe delim in field"
107|\" Test Escaped single double quote in HIVE
108|\" Test Escaped enclosed double quote in HIVE \"
109|\\" Test Escaped enclosed double quote in HIVE \"
110|\\" Test Escaped enclosed double quote in HIVE \\"
External table create statement
drop table test_schema.hive_test;
CREATE EXTERNAL TABLE test_schema.hive_test (id string, name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES
(
"separatorChar" = "|",
"quoteChar" = "\"",
"escapeChar" = "\\"
)
LOCATION '/staging/test/hive'
tblproperties ("skip.header.line.count"="1");
Output
+---------------+-------------------------------------------------+
| hive_test.id | hive_test.name |
+---------------+-------------------------------------------------+
| 105 | Test | pipe delim in field |
| 107 | NULL |
| 108 | NULL |
| 109 | NULL |
| 110 | " Test Escaped enclosed double quote in HIVE \ |
+---------------+-------------------------------------------------+
Expected Output
+---------------+-------------------------------------------------+
| hive_test.id | hive_test.name |
+---------------+-------------------------------------------------+
| 105 | Test | pipe delim in field |
| 107 | " Test Escaped single double quote in HIVE |
| 108 | " Test Escaped enclosed double quote in HIVE " |
| 109 | NULL |
| 110 | NULL |
+---------------+-------------------------------------------------+
Open CSV version 2.3
Upvotes: 2
Views: 4644
Reputation: 7947
Sadly it is not possible to achieve because OpenCSV use single character as escape and really you are trying to use double backslash as escape character (which would be a string
). In the OpenCSVSerde class you can find that no matter what you pass as escape character,OpenCSVSerde get the first character of your string value https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java#L98
Here is the current code as reference
private char getProperty(final Properties tbl, final String property, final char def) {
final String val = tbl.getProperty(property);
if (val != null) {
return val.charAt(0);
}
return def;
}
I think is missing a warning to let the user know at the moment of the table creation that only single characters are supported.
Upvotes: 0