Nestor
Nestor

Reputation: 37

Import data using special escape character

I'm trying to import data into Snowflake using the copy command. I have a file format defined as follows:

CREATE FILE FORMAT mydb.schema1.myFileFormat
TYPE = CSV
COMPRESSION = 'AUTO' 
FIELD_DELIMITER = ',' 
RECORD_DELIMITER = '\n' 
SKIP_HEADER = 0 
FIELD_OPTIONALLY_ENCLOSED_BY = '\042' 
TRIM_SPACE = FALSE 
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE 
ESCAPE = '\241' 
ESCAPE_UNENCLOSED_FIELD = NONE
DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO' 
NULL_IF = ('\\N') 
COMMENT = '¡ used as escape character';

There's nothing special about the file format, except it's using ¡ as an escape character.

When importing data with this file format, it seems Snowflake is not recognizing the escape character, and it's throwing an error saying "Found character 'XYZ' instead of field delimiter ','".

I tried creating a file with 1 line, like the following:

"ABC123","584382","2","01","01/22/2019","02/08/2019","02/08/2019","04/03/2019","04/03/2019","TEST","Unknown","Unknown","01-884400","Unknown","DACRON CONNECTIONS 15¡"1/2 DIA. X 11¡" LONG FOR EXHAUST DAMPER","","0.0","0.0","0.0","0.0","192.0","USD","2.0","2.0","0","0","96.00000","1","","","","","07882-0047","ASDF","ASDF","02/27/2019","04/06/2021","01/01/1970","0"

This file fails on line 1, char 167, which is right after the first escape character (before the 1 in the following text: CONNECTIONS 15¡"1/2)

Any idea why this is happening?

This is the code I'm running to do the copy

copy into mydb.schema1.mytable from @mydb.schema1.mystage/file-path/2021-05-26/test.txt
file_format = mydb.schema1.myFileFormat
validation_mode = 'return_all_errors';

Upvotes: 1

Views: 742

Answers (1)

Simon D
Simon D

Reputation: 6229

Short Answer Looks like Snowflake only allows single-byte characters to be used as an escape character for a file format. The character you're using as the escape character uses two bytes and therefore isn't allowed as an escape character by the file format.

You can however use multi-byte characters for field and row delimiters so not sure why Snowflake hasn't allowed it as the escape character as well.

Longer Answer The character you're trying to use as the escape character (¡) is two bytes long with a hex value of \xC2\xA1. This isn't allowed as you can see by the following error:

CREATE OR REPLACE FILE FORMAT myFileFormat
    TYPE = CSV
        COMPRESSION = 'AUTO'
        FIELD_DELIMITER = ','
        RECORD_DELIMITER = '\n'
        SKIP_HEADER = 0
        FIELD_OPTIONALLY_ENCLOSED_BY = '\x22' -- Double quotes (")
        ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
        ESCAPE = '\xC2\xA1' -- Inverted exclamation point (¡)
        DATE_FORMAT = 'AUTO'
        TIMESTAMP_FORMAT = 'AUTO'
        NULL_IF = ('\\N')

invalid value ['\xC2\xA1'] for parameter 'ESCAPE'

On the other hand, if I use the last single-byte character I could possibly use (and is visible), the tilde (~), with a hex value of \x7E (you'd think it should be \xFF but utf-8 uses 7 bits before it goes into 2 bytes. Long story.) then it works fine. I tested this with a file and copy command and it works without issue.

CREATE OR REPLACE FILE FORMAT myFileFormat
    TYPE = CSV
        COMPRESSION = 'AUTO'
        FIELD_DELIMITER = ','
        RECORD_DELIMITER = '\n'
        SKIP_HEADER = 0
        FIELD_OPTIONALLY_ENCLOSED_BY = '\x22' -- Double quotes (")
        ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
        ESCAPE = '\x7E' -- Tilde (~)
        DATE_FORMAT = 'AUTO'
        TIMESTAMP_FORMAT = 'AUTO'
        NULL_IF = ('\\N')

[2021-05-26 23:49:21] completed in 149 ms

Upvotes: 2

Related Questions