ccleve
ccleve

Reputation: 15809

Import Postgres data into RDS using S3 and aws_s3

I'm having a hard time importing data from S3 into an RDS postgres instance. According to the docs, you can use this syntax:

aws_s3.table_import_from_s3 (
   table_name text, 
   column_list text, 
   options text, 
   bucket text, 
   file_path text, 
   region text, 
   access_key text, 
   secret_key text, 
   session_token text 
) 

So, in pgAdmin, I did this:

SELECT aws_s3.table_import_from_s3(
  'contacts_1', 
  'firstname,lastname,imported', 
  '(format csv)',
  'com.foo.mybucket', 
  'mydir/subdir/myfile.csv', 
  'us-east-2',
  'AKIAYYXUMxxxxxxxxxxx',
  '3zB4S5jb1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
);

I also tried it with an explicit NULL for the last parameter.

The error message I get is:

NOTICE:  CURL error code: 51 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE:  CURL error code: 51 when attempting to validate pre-signed URL, 0 attempt(s) remaining

ERROR:  Unable to generate pre-signed url, look at engine log for details.
SQL state: XX000

I checked the server logs and there was no further information.

I have triple-checked the correctness of all the parameters. How do I make this work?

UPDATE:

I can confirm that I can do an s3.getObject() in the Java aws sdk using these same credentials.

Upvotes: 28

Views: 36111

Answers (9)

Jing
Jing

Reputation: 131

I found out it only works for RDS PostgreSQL, the standalone DB instance version, but not Auroral PostgreSQL, the cluster hosting option.

I am seeing the same error on Aurora cluster when copying the data from S3.

SQL Error [XX000]: ERROR: Unable to generate pre-signed url, look at engine log for details.
  Where: SQL function "table_import_from_s3" statement 1

I created a RDS Auroral PostgreSQL DB cluster and a RDS Postgres DB instance, using the same VPC, private subnets, the vpc endpoint, the IAM role and policies and the same security groups for triage.

For Auroral PostgreSQL you can only attach the role rds-s3-import-role to the cluster, not the writer instance:

aws rds add-role-to-db-cluster \
   --feature-name s3Import \
   --db-cluster-identifier arn:aws:rds:eu-west-1:12345678900:cluster:playground-cluster \
   --role-arn arn:aws:iam::12345678900:role/rds-s3-import-role \
   --region eu-west-1
aws rds add-role-to-db-instance \
   --db-instance-identifier magna-playground \
   --feature-name s3Import \
   --role-arn arn:aws:iam::12345678900:role/rds-s3-import-role \
   --region eu-west-1

An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The magna-playground DB instance is associated with a database cluster. Manage the arn:aws:iam::123456789000:role/rds-s3-import-role IAM role from the cluster instead of from the DB instance.

For RDS PosgreSQL instance, this is ok:

aws rds add-role-to-db-instance \
   --db-instance-identifier test-db-1-s3-copy \
   --feature-name s3Import \
   --role-arn arn:aws:iam::123456789000:role/rds-s3-import-role \
   --region eu-west-1

I think the problem is that cluster writer node needs to have IAM role attached to it directly for accessing S3. Role grant at cluster level doesn't work.

Upvotes: 2

Giuseppe Broccolo
Giuseppe Broccolo

Reputation: 455

The main issue here is that you need to 1) add a IAM role to the RDS instance to access the S3 bucket and 2) add an S3 endpoint to the VPC where the RDS instance run in order to allow communications.

This is the procedure I followed to make it work, using AWS cli commands in a shell (take care of value properly the environmental variables involved), hope it can help:

  1. Create the IAM role:
$ aws iam create-role \
    --role-name $ROLE_NAME \
    --assume-role-policy-document '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Principal": {"Service": "rds.amazonaws.com"}, "Action": "sts:AssumeRole"}]}'
  1. Create the IAM policy that will be attached to the IAM role:
$ aws iam create-policy \
    --policy-name $POLICY_NAME \
    --policy-document '{"Version": "2012-10-17", "Statement": [{"Sid": "s3import", "Action": ["s3:GetObject", "s3:ListBucket"], "Effect": "Allow", "Resource": ["arn:aws:s3:::${BUCKET_NAME}", "arn:aws:s3:::${BUCKET_NAME}/*"]}]}'
  1. Attach the policy:
$ aws iam attach-role-policy \
    --policy-arn arn:aws:iam::$AWS_ACCOUNT_ID:policy/$POLICY_NAME \
    --role-name $ROLE_NAME
  1. Add the role to a specific instance - this step need to be repeated for every new instance:
$ aws rds add-role-to-db-instance \
    --db-instance-identifier $RDS_INSTANCE_NAME \
    --feature-name s3Import \
    --role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
    --region $REGION
  1. Create the VPC endpoint for the S3 service:
$ aws ec2 create-vpc-endpoint \
    --vpc-id $VPC_ID \
    --service-name com.amazonaws.$REGION.s3 \
    --route-table-ids $ROUTE_TABLE_ID

The route table id related to the VPC where the endpoint is created can be retrieved through the command

$ aws ec2 describe-route-tables | jq -r '.RouteTables[] | "\(.VpcId) \(.RouteTableId)"'

Upvotes: 29

primordius
primordius

Reputation: 31

Had a similar problem, Postgres in a private subnet

Solved it with a VPC endpoint for S3

Upvotes: 1

wrsenn
wrsenn

Reputation: 23

One cause of this issue, assuming you've otherwise configured everything correctly, may be Network ACLs. Make sure there are NACL rules explicitly allowing connectivity between the DB and S3.

In our case, we used the terraform-aws-modules/vpc/aws module for VPC configuration, which provides for broad VPC configuration, including NACL. After manually inspecting the resulting configuration in AWS Console, we found that the rules generated by the configuration provided to the module were too restrictive. This wasn't reflected in any error messages, as the requests from RDS to S3 were simply blocked at the network level.

Upvotes: 0

Vladyslav Diachenko
Vladyslav Diachenko

Reputation: 785

I had the same issue.

ERROR:  Unable to generate pre-signed url, look at engine log for details

that issue was related to error:

:LOG: S3 bucket names with a period (.) are not supported

The root cause of the issue, in my case, was . (dot) in bucket's name.

Upvotes: 1

Juha Syrjälä
Juha Syrjälä

Reputation: 34281

Currently (2020-10-16) at least with RDS Postgres 12.4, importing files from S3 does not work if the file is not in root of S3 bucket.

Importing file myfile.csv works, importing file mydir/subdir/myfile.csv doesn't work. The latter will give these kinds of errors if permissions and everything else is working:

[XX000] ERROR: HTTP 403. Permission denied. Check bucket or provided credentials as they may no longer be valid.

This is known issue to AWS and according to AWS Support they are working on the issue.

Also it seems that you need to reboot the RDS instance after adding the roles described in documentation. Otherwise the roles do not have any effect. The need for reboot is not mentioned in the documentation.

In general RDS s3Import gives very confusing error messages. For example, importing 0 length file, gives this error:

[Amazon](500310) Invalid operation: HTTP 416. Check your arguments and try again.;

Upvotes: 10

morci7
morci7

Reputation: 31

Check your RDS and S3 are at the same region. I had same problem and fixed it using a bucket in the same region my Aurora RDS.

Upvotes: 3

namiops
namiops

Reputation: 81

I solved the same problem when deployed production clusters under private subnets.

Please check your cluster security group's outbound ( my case )

And also add rds-import-role to "Manage IAM roles" with select feature is s3import

I hope it helps.

Upvotes: 3

John Rotenstein
John Rotenstein

Reputation: 270124

To reproduce your situation, I did the following:

  • Launched an Amazon RDS PostgreSQL instance in a public subnet
  • Under Manage IAM Roles, I assigned AmazonRDSServiceRolePolicy for s3import
  • Created a table
  • Put a CSV file in S3
  • Used the SELECT aws_s3.table_import_from_s3() command (as above) to load the data

It worked fine for me.

Given that your error message relates to a pre-signed URL, it suggests that the credentials you provided did not have permission to access the CSV file in S3. However, you then say that you used those credentials successfully to retrieve the object. So, this is unlikely to be the cause.

Based on Reddit: Having issue with AWS RDS Postgres 11+ import from S3 using RDS s3Import feature : aws, the issue might be related to the fact that the Amazon RDS instance is unable to access Amazon S3. This could be because it is in a private subnet with no NAT Gateway in the VPC. If this is the case, then you could either add a NAT Gateway to provide Internet connectivity or, as mentioned in the link, add a VPC Endpoint for S3.

Another comment in that post reported the same problem with a missing Outbound rule in the Security Group, which stopped the RDS instance from accessing Amazon S3.

Upvotes: 8

Related Questions