Reputation: 1
I am facing an issue where the extracted files columns could be changing for differents:
for example:
On day 1, file might have 3 columns : c1,c2,c3
On day 2, file might have 5 columns : c1,c3,c2,c4,c5
notice the column position of c3 in the second file.
Using "copy into" from external stage syntax in snowflake won't work as the c3 column was introduced after c1. tried with the External table, but it also requires a positional column to work. Has anyone worked out how to load these type files?
Upvotes: 0
Views: 193
Reputation: 2850
You are not telling anything about the format being used.
The only way to load varying columns is to load from file as a single column, eg. with FIELD_DELIMITER = NONE
and split & convert into an OBJECT
with each file column as an attribute.
If the first record contains the field names c1 ... cn, you can load with:
WITH
file AS (SELECT * FROM VALUES ('c1,c2,c3'), ('1,2,3'), ('11,22,33') t(REC)),
split_file AS (SELECT * FROM file CROSS JOIN LATERAL SPLIT_TO_TABLE(REC, ','))
combined_table AS (
SELECT content.SEQ - 1 REC_NO, OBJECT_AGG(headers.VALUE, content.VALUE::VARIANT) OBJ
FROM split_file content
INNER JOIN split_file headers
ON content.INDEX = headers.INDEX AND content.SEQ > 1 AND headers.SEQ = 1
GROUP BY content.SEQ
)
SELECT OBJ:c1::NUMBER c1, OBJ:c2::NUMBER c2, OBJ:c3::NUMBER c3, OBJ:c4::NUMBER c4
FROM combined_table;
The example above combines everything into a single query, but in your case you have to aggregate each file separately and INSERT
(append) to the combined_table
.
The reason why this works is that you can reference object attributes(columns) that are not there (eg. c4), and they will be substituted with NULL
.
Upvotes: 1