Sudhir Nune
Sudhir Nune

Reputation: 11

Snowflake unable to load data with double quote in Data as well as for test Values

Below is Data

"1"|"2"|"XCEFFDDD - "CV" test"|"3"

I can't change the input data, with the above data when ingesting to snowflake using the FileFormat with options FIELD_DELIMITER = '|', ESCAPE_UNENCLOSED_FIELD = 'NONE', TRIM_SPACE = TRUE, NULL_IF = ('', 'NULL', 'null', '\N'), SKIP_BLANK_LINES = TRUE, REPLACE_INVALID_CHARACTERS = TRUE, FIELD_OPTIONALLY_ENCLOSED_BY = '\"'

the data load is failing with Error: Found character 'C' instead of field delimiter '|' in the file.

any suggestions on how can we surpass this error ?

Upvotes: 0

Views: 2282

Answers (1)

user13472370
user13472370

Reputation:

"1"|"2"|"XCEFFDDD - "CV" test"|"3"

Your record is un-parseable by standard delimited reading mechanisms. Quotes in delimited records are normally used to allow the data to carry delimiters and newline characters within them, but require you to escape the quotes themselves, i.e. the record ought to be in the following form to be read properly by a regular reader implementation (notice the double double-quotes within the data):

"1"|"2"|"XCEFFDDD - ""CV"" test"|"3"

Whatever system produced these records is not using a standard CSV writer, and is likely writing the surrounding quotes loosely (or the escapes are lost at some other later transformation). It is best to reach out to the data owners to get this addressed.

Most delimited-file reader implementations rely on a state-machine style of record reading, where the appearance of a quote puts them in a mode of field reading until the next quote appears, which your data formatting violates.

That said, you can attempt to load this using unconventional methods if the following conditions apply to your dataset:

  1. None of the record fields carry newline characters within a "quoted" field
  2. None of the record fields carry delimiter (|) characters within a "quoted" field

If you're in luck and these rules apply to your data, then you can attempt to manually load it with the following rough steps. Otherwise, you can ignore what follows below.

  • Load all fields of the input data into a table using the delimiter (|), in preserved-quotes form by not passing any file format options about quoting.
  • Use the SUBSTRING function to strip the outer quotes from the individual string columns, and load to the destination table.

A very rough example of the idea in Snowflake SQL, ignoring situations such as null-character handling, blank lines, etc.:

CREATE TABLE destination_with_four_fields (one STRING, two STRING, three STRING, four STRING);
CREATE TEMPORARY TABLE temp_with_four_fields (one STRING, two STRING, three STRING, four STRING);

COPY INTO temp_with_four_fields FROM @stage/location FILE_FORMAT=(TYPE=CSV, FIELD_DELIMITER='|');

INSERT INTO destination_with_four_fields
  SELECT
    SUBSTRING(one, 2, LENGTH(one) - 2) one,
    SUBSTRING(two, 2, LENGTH(two) - 2) two,
    SUBSTRING(three, 2, LENGTH(three) - 2) three,
    SUBSTRING(four, 2, LENGTH(four) - 2) four
  FROM temp_with_four_fields;

Sample outputs from each step:

> SELECT * FROM temp_with_four_fields;
+-----+-----+------------------------+------+                                     
| ONE | TWO | THREE                  | FOUR |
|-----+-----+------------------------+------|
| "1" | "2" | "XCEFFDDD - "CV" test" | "3"  |
+-----+-----+------------------------+------+

> SELECT * FROM destination_with_four_fields;
+-----+-----+----------------------+------+                                     
| ONE | TWO | THREE                | FOUR |
|-----+-----+----------------------+------|
| 1   | 2   | XCEFFDDD - "CV" test | 3    |
+-----+-----+----------------------+------+

Note: You could use TRIM(col, '"') instead of SUBSTRING(col, …) above but it may also remove away actual quotes appearing at the beginning or end of the real data.

Upvotes: 3

Related Questions