Reputation: 2847
I've got a table defined in Snowflake as:
BATCH_KEY NUMBER(38,0) NULL
CTACCT VARCHAR(100) NULL
CTPAGE NUMBER(38,0) NULL
and a file that looks like this:
CTACCT VARCHAR(100)
CTPAGE NUMBER(38,0)
example:
CTACCT,CTPAGE
"Test Account",100
"Second Account", 200
My copy into command looks like this:
copy into GLPCT_POC from 'azure://ouraccount.blob.core.windows.net/landing/GLPCT' credentials=(azure_sas_token='<SAS_TOKEN') file_format=(TYPE=CSV, SKIP_HEADER = 1, FIELD_OPTIONALLY_ENCLOSED_BY='"');
Snowflake is throwing an error due to a column number mismatch. How can I get Snowflake to ignore the column that isn't present in the file and not throw an error? I can move BATCH_KEY
to the end of the table if that will help.
Upvotes: 7
Views: 21632
Reputation: 176124
How can I get Snowflake to ignore the column that isn't present in the file and not throw an error?
It is possible by using MATCH_BY_COLUMN_NAME parameter:
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE
Definition
String that specifies whether to load semi-structured data into columns in the target table that match corresponding columns represented in the data.
For loading CSV files, the
MATCH_BY_COLUMN_NAME
copy option is available in preview. It requires the use of the above-mentioned CSV file format optionPARSE_HEADER = TRUE
.
The COPY command:
COPY INTO GLPCT_POC FROM 'azure://ouraccount.blob.core.windows.net/landing/GLPCT'
credentials=(azure_sas_token='<SAS_TOKEN>')
file_format=(TYPE=CSV, PARSE_HEADER = TRUE, FIELD_OPTIONALLY_ENCLOSED_BY='"')
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
Upvotes: 3
Reputation: 11
Snowflake allows you to set the ERROR_ON_COLUMN_COUNT_MISMATCH in the file format.
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input data file does not match the number of columns in the corresponding table.
If set to FALSE, an error is not generated and the load continues. If the file is successfully loaded:
If the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file and the remaining fields are not loaded.
If the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#type-csv
Upvotes: 1
Reputation: 2847
It appears it's possible to indicate what columns to insert into with a COPY INTO statement, so ours becomes:
copy into GLPCT_POC (CTACCT, CTPAGE) from 'azure://ouraccount.blob.core.windows.net/landing/GLPCT' credentials=(azure_sas_token='<SAS_TOKEN') file_format=(TYPE=CSV, SKIP_HEADER = 1, FIELD_OPTIONALLY_ENCLOSED_BY='"');
We could not use a transformation as mentioned in a previous answer, due to this being an external file.
Upvotes: 7
Reputation: 2880
You can add a "transformation" as you pull data in with a copy into query. In this case your transformation can be to add a NULL column.
However, in order to use this feature, you need to create a stage for your external source
create or replace stage my_stage
url='azure://ouraccount.blob.core.windows.net/landing/GLPCT'
credentials=(azure_sas_token='<SAS_TOKEN')
file_format=(TYPE=CSV, SKIP_HEADER = 1, FIELD_OPTIONALLY_ENCLOSED_BY='"');
copy into GLPCT_POC
from (SELECT NULL, $1, $2 FROM @my_stage);
The $1 and $2 line up with the columns in the file, and then the order of the columns in the select clause line up with the columns of the table.
The extra benefit of this is if you are reusing that copy statement and/or stage, you don't need to have all the credential and file format information repeated.
See Data load with transformation syntax
Upvotes: 4