Reputation: 11
I am trying to load about 35k csv files from AWS S3 to Snowflake. The files range from 10mb to 50mb. Although, I've seen a couple files exceed 50 mb.
I tried building a Snowpipe on top of the s3 bucket, however, Snowpipe will not pick up files that have been modified before 7 days if you refresh the pipe. Therefore, I resorted to bulk loading the files into Snowflake using the COPY INTO statement below:
COPY INTO HS_TABLE
FROM @S3_STAGE
FILE_FORMAT = HS_FILE_FORMAT
ON_ERROR = 'CONTINUE';
I am copying the data from a external stage (S3_STAGE) as well as utilizing a x-small warehouse. This query has been running for 8.5 hours now. Does anyone know a faster way to load such a large amount of historical data into Snowflake? Is there another way to utilize Snowpipe for historical data? Or am I just doing something wrong? I could seriously use some help!
Upvotes: 1
Views: 37
Reputation: 169
Thanks for the detailed background on the problem. As the X-SMALL warehouse has limited computing capacity, a limited number of files are ingested concurrently. You have multiple thousand files with 10-50 MB. Using a larger warehouse (large, x-large) should parallelize the bulk loading operation. Refer to this doc on improving bulk loading.
Regarding why you cannot load the historical files using SnowPipe, Snowflake doesn't have information on files created before SQS notifications are made. In Snowflake docs, there is a step to load history using bulk load. Please refer to this docs https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3#step-5-load-historical-files
Upvotes: 0