Reputation: 1297
When i run my copy command to copy all the files from an S3 folder to a Redshift table it fails with "ERROR: gzip: unexpected end of stream. Unknown zlib error code. zlib error code: -1"
:
copy table_name
(column_list)
from 's3://bucket_name/folder_name/'
credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
delimiter '|' GZIP
However when I specify a file prefix for each of the files within the folder it succeeds:
copy table_name
(column_list)
from 's3://bucket_name/folder_name/file_prefix'
credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
delimiter '|' GZIP
The files are GZIP-ed.
It is not explicitly specified in the AWS doc that if you just specify the folder_name it will be ok for the copy command to load the whole contents of that folder, however I do get an error.
Does anyone encountered any similar issues? Is a file-prefix required for this kind of operations?
Upvotes: 4
Views: 12152
Reputation: 308
For me, the issue was the manifest file had the original unloaded gz file path written inside. You can delete the manifest file and the COPY command will read the gzip file successfully from the path you've specified in the command itself.
Upvotes: 0
Reputation: 21
I encountered the same issue and in my case gzip files were correct as when using the copy command with exact file name, it was working fine.
The issue was mainly because of application "S3 Browser". When you create directories with it, it create some extra hidden files in it. And when the copy command try to read files in the directory, it reads those hidden invalid gzip file and throws the error.
Upvotes: 2
Reputation: 14035
One of your gzipped files is not properly formed. GZip includes the compression "dictionary" at the end of the file and it can't be expanded without it.
If the file does not get fully written, e.g., you run out of disk space, then you get the error you're seeing when you attempt to load it into Redshift.
Speaking from experience… ;-)
Upvotes: 5