Martin Thoma
Martin Thoma

Reputation: 136645

What does the suffix mean when unloading with Snowflake to S3?

When I unload to S3 like this:

COPY INTO '@stage/foobar'
FROM some.table
file_format = (type = csv,
               RECORD_DELIMITER = '\n',
               FIELD_DELIMITER=',',
               FIELD_OPTIONALLY_ENCLOSED_BY='"')
SINGLE = FALSE
HEADER = TRUE

I get multiple files on S3. I executed dumps of this type twice and both times it was foobar_[digit]_[digit]_[digit].csv.gz, but they look different:

# files dumped the first time
foobar_0_0_0.csv.gz
foobar_0_1_0.csv.gz
foobar_0_2_0.csv.gz
foobar_0_3_0.csv.gz
foobar_0_4_0.csv.gz
foobar_0_5_0.csv.gz
foobar_0_6_0.csv.gz
foobar_0_7_0.csv.gz
foobar_1_0_0.csv.gz
foobar_1_2_0.csv.gz
foobar_1_3_0.csv.gz
foobar_1_4_0.csv.gz
foobar_1_5_0.csv.gz
foobar_1_7_0.csv.gz

# files dumped the second time
foobar_0_1_0.csv.gz
foobar_0_2_0.csv.gz
foobar_0_3_0.csv.gz
foobar_0_4_0.csv.gz
foobar_0_5_0.csv.gz
foobar_0_6_0.csv.gz
foobar_0_7_0.csv.gz
foobar_1_0_0.csv.gz
foobar_1_1_0.csv.gz
foobar_1_2_0.csv.gz
foobar_1_3_0.csv.gz
foobar_1_4_0.csv.gz
foobar_1_7_0.csv.gz

What do the suffixes mean?

There are three things that confuse me:

  1. The last digit is not used
  2. The second digit starts counting up (0-7), but stops at 7.
  3. Why does 0_0_0 not exist for the second dump?

Upvotes: 1

Views: 677

Answers (1)

Ngenator
Ngenator

Reputation: 11269

Those suffixes are just to ensure unique names across parallel executions but it isn't significant other than that. You can adjust the number of files it creates during an unload by using the MAX_FILE_SIZE copy option or disable unloading into multiple files by setting SINGLE = TRUE.

https://docs.snowflake.net/manuals/user-guide/data-unload-overview.html#bulk-unloading-into-single-or-multiple-files

Upvotes: 3

Related Questions