Reputation: 11
Below is Data
"1"|"2"|"XCEFFDDD - "CV" test"|"3"
I can't change the input data, with the above data when ingesting to snowflake using the FileFormat with options FIELD_DELIMITER = '|', ESCAPE_UNENCLOSED_FIELD = 'NONE', TRIM_SPACE = TRUE, NULL_IF = ('', 'NULL', 'null', '\N'), SKIP_BLANK_LINES = TRUE, REPLACE_INVALID_CHARACTERS = TRUE, FIELD_OPTIONALLY_ENCLOSED_BY = '\"'
the data load is failing with Error: Found character 'C' instead of field delimiter '|' in the file.
any suggestions on how can we surpass this error ?
Upvotes: 0
Views: 2282
Reputation:
"1"|"2"|"XCEFFDDD - "CV" test"|"3"
Your record is un-parseable by standard delimited reading mechanisms. Quotes in delimited records are normally used to allow the data to carry delimiters and newline characters within them, but require you to escape the quotes themselves, i.e. the record ought to be in the following form to be read properly by a regular reader implementation (notice the double double-quotes within the data):
"1"|"2"|"XCEFFDDD - ""CV"" test"|"3"
Whatever system produced these records is not using a standard CSV writer, and is likely writing the surrounding quotes loosely (or the escapes are lost at some other later transformation). It is best to reach out to the data owners to get this addressed.
Most delimited-file reader implementations rely on a state-machine style of record reading, where the appearance of a quote puts them in a mode of field reading until the next quote appears, which your data formatting violates.
That said, you can attempt to load this using unconventional methods if the following conditions apply to your dataset:
|
) characters within a "quoted" fieldIf you're in luck and these rules apply to your data, then you can attempt to manually load it with the following rough steps. Otherwise, you can ignore what follows below.
|
), in preserved-quotes form by not passing any file format options about quoting.SUBSTRING
function to strip the outer quotes from the individual string columns, and load to the destination table.A very rough example of the idea in Snowflake SQL, ignoring situations such as null-character handling, blank lines, etc.:
CREATE TABLE destination_with_four_fields (one STRING, two STRING, three STRING, four STRING);
CREATE TEMPORARY TABLE temp_with_four_fields (one STRING, two STRING, three STRING, four STRING);
COPY INTO temp_with_four_fields FROM @stage/location FILE_FORMAT=(TYPE=CSV, FIELD_DELIMITER='|');
INSERT INTO destination_with_four_fields
SELECT
SUBSTRING(one, 2, LENGTH(one) - 2) one,
SUBSTRING(two, 2, LENGTH(two) - 2) two,
SUBSTRING(three, 2, LENGTH(three) - 2) three,
SUBSTRING(four, 2, LENGTH(four) - 2) four
FROM temp_with_four_fields;
Sample outputs from each step:
> SELECT * FROM temp_with_four_fields;
+-----+-----+------------------------+------+
| ONE | TWO | THREE | FOUR |
|-----+-----+------------------------+------|
| "1" | "2" | "XCEFFDDD - "CV" test" | "3" |
+-----+-----+------------------------+------+
> SELECT * FROM destination_with_four_fields;
+-----+-----+----------------------+------+
| ONE | TWO | THREE | FOUR |
|-----+-----+----------------------+------|
| 1 | 2 | XCEFFDDD - "CV" test | 3 |
+-----+-----+----------------------+------+
Note: You could use TRIM(col, '"')
instead of SUBSTRING(col, …)
above but it may also remove away actual quotes appearing at the beginning or end of the real data.
Upvotes: 3