Reputation: 1219
I am trying to copy data from S3 bucket into Redshift using copy
command.
When I pass the exact file name in copy command it works perfectly fine but when I use wildcard (*
) in my from clause it throws an error - "The specified S3 prefix 'suiteX*' does not exist;"
All my files follow this naming convention - SuiteX_Date_Hour.csv
for example - SuiteX-20150101-1.csv
Here is my copy command that works :
copy mytable from 's3://mybucket/suiteX_20150101_1.csv'
CREDENTIALS 'aws_access_key_id=XXXXX;aws_secret_access_key=XXXX'
delimiter ',' REGION AS 'us-east-1';
Here is the copy command that does not work:
copy mytable from 's3://mybucket/suiteX*'
CREDENTIALS 'aws_access_key_id=XXXXX;aws_secret_access_key=XXXX'
delimiter ',' REGION AS 'us-east-1';
Any idea how can make copy command to work with wildcard when I copy data from S3 ?
Upvotes: 7
Views: 17947
Reputation: 1160
The redshift COPY command doesn't have an explicit wildcard syntax. The object path you provide is treated like a prefix, and any matching objects will be COPY-ed. If the object path matches multiple folders, all objects in all those folders will be COPY-ed.
You should be able to get it to work for your example with:
copy mytable
from s3://mybucket/suiteX'
CREDENTIALS 'aws_access_key_id=XXXXX;aws_secret_access_key=XXXX'
delimiter ','
REGION AS 'us-east-1';
The relevent section from the COPY from Amazon S3 docs says:
FROM 's3://copy_from_s3_objectpath'
Specifies the path to the Amazon S3 objects that contain the data—for example, 's3://mybucket/cust.txt'. The s3://copy_from_s3_objectpath parameter can reference a single file or a set of objects or folders that have the same key prefix. For example, the name custdata.txt is a key prefix that refers to a number of physical files: custdata.txt.1, custdata.txt.2, and so on. The key prefix can also reference a number of folders. For example, 's3://mybucket/custfolder' refers to the folders custfolder_1, custfolder_2, and so on. If a key prefix references multiple folders, all of the files in the folders will be loaded.
Upvotes: 12
Reputation: 2757
According to the Redshift docs, I don't think COPY command supports wildcard for s3 file source path. Instead, you can specify a bucket directory as follows.
copy data
from 's3://mybucket/data/'
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';
In this case, the files under data/
directory will be copied to Redshift.
Upvotes: 4