Moe Chughtai
Moe Chughtai

Reputation: 384

Data Copy from s3 to Redshift: Manifest is in different bucket than files I need to download

I am trying to copy data from a large number of files in s3 over to Redshift. I have read-only access to the s3 bucket which contains these files. In order to COPY them efficiently, I created a manifest file that contains the links to each of the files I need copied over.

Bucket 1:
 - file1.gz
 - file2.gz
 - ...

Bucket 2:
 - manifest

Here is the command I've tried to copy data from bucket 1 using the manifest in bucket 2:

-- Load data from s3
copy data_feed_eval from 's3://bucket-2/data_files._manifest'
CREDENTIALS 'aws_access_key_id=bucket_1_key;aws_secret_access_key=bucket_1_secret'
manifest
csv gzip delimiter ',' dateformat 'YYYY-MM-DD' timeformat 'YYYY-MM-DD HH:MI:SS'
maxerror 1000 TRUNCATECOLUMNS;

However, when running this command, I get the following error:

09:45:32  [COPY - 0 rows, 7.576 secs]  [Code: 500310, SQL State: XX000]  [Amazon](500310) Invalid operation: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 901E02533CC5010D,ExtRid tEvf/TVfZzPfSNAFa8iTYjTBjvaHnMMPmuwss58SwopY/sZSkhUBe3yMGHTDyA0yDhDCD7ybX9gl45pV/eQ=,CanRetry 1
Details: 
 -----------------------------------------------
  error:  Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 901E02533CC5010D,ExtRid tEvf/TVfZzPfSNAFa8iTYjTBjvaHnMMPmuwss58SwopY/sZSkhUBe3yMGHTDyA0yDhDCD7ybX9gl45pV/eQ=,CanRetry 1
  code:      8001
  context:   s3://bucket-2/data_files._manifest
  query:     2611231
  location:  s3_utility.cpp:284
  process:   padbmaster [pid=10330]
  -----------------------------------------------;

I believe the issue here is I'm passing bucket_1 credentials in my COPY command. Is it possible to pass credentials for multiple buckets (bucket_1 with the actual files, and bucket_2 with the manifest) to the COPY command? How should I approach this assuming I don't have write access to bucket_1?

Upvotes: 0

Views: 2990

Answers (1)

Pawel
Pawel

Reputation: 626

You have indicated that bucket_1_key key (which is IAM user) has permissions limited to "read-only" from bucket_1. If this is the case then the error occurs because that key has no permission read from bucket_2. You have mentioned this a possible cause already and it is exactly that.

There is no option to supply two sets of keys to COPY command. But, you should consider the following options:

Option 1

According to this "You can specify the files to be loaded by using an Amazon S3 object prefix or by using a manifest file." If there is a common prefix for the set of files you want to load, you can use that prefix in bucket_1 in COPY command. See http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html

You have mentioned you have read-only access to bucket 1. Make sure this is sufficient access as defined in http://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-access-permissions.html#copy-usage_notes-iam-permissions

All the other options require changes to your key/IAM user permissions or Redshift itself.

Option 2

Extend permissions of bucket_1_key key to be able to read from bucket_2 as well. You will have to make sure that your bucket_1_key key has LIST access to bucket_2 and GET access for the bucket_2 objects (as documented here). This way you can continue using bucket_1_key key in COPY command. This method is referred to as Key-Based Access Control and uses plain-text access key ID and secret access key. AWS recommends to use Role-Based Access Control (option 3) instead.

Option 3

Use IAM role in COPY command instead of key (option 2). This is referred to as Role-Based Access Control. This is also strongly recommended authentication option to use in COPY command. This IAM role would have to privileges to LIST access on buckets 1 and 2 and GET access for the objects in those buckets.

More info about Key-Based and Role-Based Access Control is here.

Upvotes: 1

Related Questions