Reputation: 15809
I'm having a hard time importing data from S3 into an RDS postgres instance. According to the docs, you can use this syntax:
aws_s3.table_import_from_s3 (
table_name text,
column_list text,
options text,
bucket text,
file_path text,
region text,
access_key text,
secret_key text,
session_token text
)
So, in pgAdmin, I did this:
SELECT aws_s3.table_import_from_s3(
'contacts_1',
'firstname,lastname,imported',
'(format csv)',
'com.foo.mybucket',
'mydir/subdir/myfile.csv',
'us-east-2',
'AKIAYYXUMxxxxxxxxxxx',
'3zB4S5jb1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
);
I also tried it with an explicit NULL for the last parameter.
The error message I get is:
NOTICE: CURL error code: 51 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: CURL error code: 51 when attempting to validate pre-signed URL, 0 attempt(s) remaining
ERROR: Unable to generate pre-signed url, look at engine log for details.
SQL state: XX000
I checked the server logs and there was no further information.
I have triple-checked the correctness of all the parameters. How do I make this work?
UPDATE:
I can confirm that I can do an s3.getObject() in the Java aws sdk using these same credentials.
Upvotes: 28
Views: 36111
Reputation: 131
I found out it only works for RDS PostgreSQL, the standalone DB instance version, but not Auroral PostgreSQL, the cluster hosting option.
I am seeing the same error on Aurora cluster when copying the data from S3.
SQL Error [XX000]: ERROR: Unable to generate pre-signed url, look at engine log for details.
Where: SQL function "table_import_from_s3" statement 1
I created a RDS Auroral PostgreSQL DB cluster and a RDS Postgres DB instance, using the same VPC, private subnets, the vpc endpoint, the IAM role and policies and the same security groups for triage.
For Auroral PostgreSQL you can only attach the role rds-s3-import-role
to the cluster, not the writer instance:
aws rds add-role-to-db-cluster \
--feature-name s3Import \
--db-cluster-identifier arn:aws:rds:eu-west-1:12345678900:cluster:playground-cluster \
--role-arn arn:aws:iam::12345678900:role/rds-s3-import-role \
--region eu-west-1
aws rds add-role-to-db-instance \
--db-instance-identifier magna-playground \
--feature-name s3Import \
--role-arn arn:aws:iam::12345678900:role/rds-s3-import-role \
--region eu-west-1
An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The magna-playground DB instance is associated with a database cluster. Manage the arn:aws:iam::123456789000:role/rds-s3-import-role IAM role from the cluster instead of from the DB instance.
For RDS PosgreSQL instance, this is ok:
aws rds add-role-to-db-instance \
--db-instance-identifier test-db-1-s3-copy \
--feature-name s3Import \
--role-arn arn:aws:iam::123456789000:role/rds-s3-import-role \
--region eu-west-1
I think the problem is that cluster writer node needs to have IAM role attached to it directly for accessing S3. Role grant at cluster level doesn't work.
Upvotes: 2
Reputation: 455
The main issue here is that you need to 1) add a IAM role to the RDS instance to access the S3 bucket and 2) add an S3 endpoint to the VPC where the RDS instance run in order to allow communications.
This is the procedure I followed to make it work, using AWS cli commands in a shell (take care of value properly the environmental variables involved), hope it can help:
$ aws iam create-role \
--role-name $ROLE_NAME \
--assume-role-policy-document '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Principal": {"Service": "rds.amazonaws.com"}, "Action": "sts:AssumeRole"}]}'
$ aws iam create-policy \
--policy-name $POLICY_NAME \
--policy-document '{"Version": "2012-10-17", "Statement": [{"Sid": "s3import", "Action": ["s3:GetObject", "s3:ListBucket"], "Effect": "Allow", "Resource": ["arn:aws:s3:::${BUCKET_NAME}", "arn:aws:s3:::${BUCKET_NAME}/*"]}]}'
$ aws iam attach-role-policy \
--policy-arn arn:aws:iam::$AWS_ACCOUNT_ID:policy/$POLICY_NAME \
--role-name $ROLE_NAME
$ aws rds add-role-to-db-instance \
--db-instance-identifier $RDS_INSTANCE_NAME \
--feature-name s3Import \
--role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
--region $REGION
$ aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--service-name com.amazonaws.$REGION.s3 \
--route-table-ids $ROUTE_TABLE_ID
The route table id related to the VPC where the endpoint is created can be retrieved through the command
$ aws ec2 describe-route-tables | jq -r '.RouteTables[] | "\(.VpcId) \(.RouteTableId)"'
Upvotes: 29
Reputation: 31
Had a similar problem, Postgres in a private subnet
Solved it with a VPC endpoint for S3
Upvotes: 1
Reputation: 23
One cause of this issue, assuming you've otherwise configured everything correctly, may be Network ACLs. Make sure there are NACL rules explicitly allowing connectivity between the DB and S3.
In our case, we used the terraform-aws-modules/vpc/aws
module for VPC configuration, which provides for broad VPC configuration, including NACL. After manually inspecting the resulting configuration in AWS Console, we found that the rules generated by the configuration provided to the module were too restrictive. This wasn't reflected in any error messages, as the requests from RDS to S3 were simply blocked at the network level.
Upvotes: 0
Reputation: 785
I had the same issue.
ERROR: Unable to generate pre-signed url, look at engine log for details
that issue was related to error:
:LOG: S3 bucket names with a period (.) are not supported
The root cause of the issue, in my case, was .
(dot) in bucket's name.
Upvotes: 1
Reputation: 34281
Currently (2020-10-16) at least with RDS Postgres 12.4, importing files from S3 does not work if the file is not in root of S3 bucket.
Importing file myfile.csv
works, importing file mydir/subdir/myfile.csv
doesn't work. The latter will give these kinds of errors if permissions and everything else is working:
[XX000] ERROR: HTTP 403. Permission denied. Check bucket or provided credentials as they may no longer be valid.
This is known issue to AWS and according to AWS Support they are working on the issue.
Also it seems that you need to reboot the RDS instance after adding the roles described in documentation. Otherwise the roles do not have any effect. The need for reboot is not mentioned in the documentation.
In general RDS s3Import gives very confusing error messages. For example, importing 0 length file, gives this error:
[Amazon](500310) Invalid operation: HTTP 416. Check your arguments and try again.;
Upvotes: 10
Reputation: 31
Check your RDS and S3 are at the same region. I had same problem and fixed it using a bucket in the same region my Aurora RDS.
Upvotes: 3
Reputation: 81
I solved the same problem when deployed production clusters under private subnets.
Please check your cluster security group's outbound ( my case )
And also add rds-import-role to "Manage IAM roles" with select feature is s3import
I hope it helps.
Upvotes: 3
Reputation: 270124
To reproduce your situation, I did the following:
AmazonRDSServiceRolePolicy
for s3import
SELECT aws_s3.table_import_from_s3()
command (as above) to load the dataIt worked fine for me.
Given that your error message relates to a pre-signed URL, it suggests that the credentials you provided did not have permission to access the CSV file in S3. However, you then say that you used those credentials successfully to retrieve the object. So, this is unlikely to be the cause.
Based on Reddit: Having issue with AWS RDS Postgres 11+ import from S3 using RDS s3Import feature : aws, the issue might be related to the fact that the Amazon RDS instance is unable to access Amazon S3. This could be because it is in a private subnet with no NAT Gateway in the VPC. If this is the case, then you could either add a NAT Gateway to provide Internet connectivity or, as mentioned in the link, add a VPC Endpoint for S3.
Another comment in that post reported the same problem with a missing Outbound rule in the Security Group, which stopped the RDS instance from accessing Amazon S3.
Upvotes: 8