Venkat
Venkat

Reputation: 51

How to create a csv file format definition to load data into snowflake table

I have a CSV file, a sample of it looks like this: Image of CSV file

Snowpipe is failing to load this CSV file with the following error:

Number of columns in file (5) does not match that of the corresponding table (3), use file format option error_on_column_count_mismatch=false to ignore this error

Can someone advise me csv file format definition to accomodate load without fail ?

Upvotes: 5

Views: 8850

Answers (3)

Simon D
Simon D

Reputation: 6229

The issue is that the data you are trying to load contains commas (,) inside the data itself. Snowflake thinks that those commas represent new columns which is why it thinks there are 5 columns in your file. It is then trying to load these 5 columns into a table with only 3 columns resulting in an error.

You need to tell Snowflake that anything inside double-quotes (") should be loaded as-is, and not to interpret commas inside quotes as column delimiters.

When you create your file-format via the web interface there is an option which allows you to tell Snowflake to do this. Set the "Field optionally enclosed by" dropdown to "Double Quote" like in this picture:

enter image description here

Alternatively, if you're creating your file-format with SQL then there is an option called FIELD_OPTIONALLY_ENCLOSED_BY that you can set to \042 which does the same thing:

CREATE FILE FORMAT "SIMON_DB"."PUBLIC".sample_file_format 
    TYPE = 'CSV' 
    COMPRESSION = 'AUTO' 
    FIELD_DELIMITER = ',' 
    RECORD_DELIMITER = '\n' 
    SKIP_HEADER = 0 
    FIELD_OPTIONALLY_ENCLOSED_BY = '\042' # <---------------- Set to double-quote
    TRIM_SPACE = FALSE 
    ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE 
    ESCAPE = 'NONE' 
    ESCAPE_UNENCLOSED_FIELD = '\134' 
    DATE_FORMAT = 'AUTO' 
    TIMESTAMP_FORMAT = 'AUTO';

Upvotes: 4

Francesco Quaratino
Francesco Quaratino

Reputation: 590

When the TAB character is unlikely to occur, I tend to use TAB delimited files - which, also - together with a Header -, make the source files more human-readable in case they need to be open for troubleshooting loading failures:

FIELD_DELIMITER = '\t'

Also (although a bit off-topic), note that Snowflake suggests files to be compressed: https://docs.snowflake.com/en/user-guide/data-load-prepare.html#data-file-compression

I mostly use GZip compression type:

COMPRESSION = GZIP

A (working) example:

CREATE FILE FORMAT Public.CSV_GZIP_TABDELIMITED_WITHHEADER_QUOTES_TRIM
    FIELD_DELIMITER = '\t'
    SKIP_HEADER = 1
    TRIM_SPACE = TRUE
    NULL_IF = ('NULL')
    COMPRESSION = GZIP
;

Upvotes: 0

Sriga
Sriga

Reputation: 1321

If possible share the file format and one sample record to figure out the issue. Seems issue with number of column, Can you include field_optionally_enclosed_by option into your copy statement and try it once.

Upvotes: 0

Related Questions