aarbor
aarbor

Reputation: 1534

Amazon Aurora 1.8 Load Data From S3 - Cannot Instantiate S3 Client

With the latest Aurora update (1.8), the command LOAD DATA FROM S3 was introduced. Has anyone gotten this to work? After upgrading to 1.8, I followed the setup guide Here to create the Role to allow access from RDS to S3.

After rebooting the server and trying to run the command

LOAD DATA FROM S3 PREFIX 's3://<bucket_name>/prefix' INTO TABLE table_name

in SQL Workbench/J, I get the errors:

Warnings:
S3 API returned error: Missing Credentials: Cannot instantiate S3 Client
S3 API returned error: Failed to instantiate S3 Client
Internal error: Unable to initialize S3Stream

Are there any additional steps required? Can I only run this from the SDK? I don't see that mentioned anywhere in the documents

Upvotes: 15

Views: 28386

Answers (11)

Gayathri
Gayathri

Reputation: 11

For anyone having this issue, mine was due to case sensitivity of the file name even after all permission issues were resolved. I got this error when I specified the filename for the Load XML command different from how it was uploaded to S3. Copying the filename as is from S3 to the command, fixed the error.

Upvotes: 0

Amiri
Amiri

Reputation: 41

I had the same error as I was trying to LOAD DATA FROM S3 using MySQL Workbench. I was already able to successfully CREATE DATABASE and CREATE TABLE and so I knew my connection was working.

I closely followed all of the AWS documentation instructions for Loading data into an Amazon Aurora MySQL DB cluster from text files in an Amazon S3 bucket.

In my case, I had not correctly followed instruction steps 3 & 4 (See list of instructions under subheading "Giving Aurora access to Amazon S3" at link above.

What fixed it for me:

  1. From Amazon RDS, I selected "Parameter Groups" in the navigation pane on the left.
  2. Then I clicked on my newly created custom DB cluster parameter group (step 3 from the link above).
  3. From within my custom group, I searched for aurora_load_from_s3_role and then in the "Values" entry box, I copy/pasted the ARN for the Role that I had just created in step 2 of the instructions into this box and clicked Save (step 4 from the link above).

I went back to MySQL Workbench and reran my LOAD DATA FROM S3 command and it worked!

Upvotes: 0

Paresh Shidruk
Paresh Shidruk

Reputation: 1

It worked for me by following step 2 to 5 and by creating VPC endpoint for S3 access.

Upvotes: 0

enharmonic
enharmonic

Reputation: 2098

If the only error is Internal error: Unable to initialize S3Stream and it throws this error immediately, possible culprits are:

The path includes the following values:

  • region (optional) – The AWS Region that contains the Amazon S3 bucket to load from. This value is optional. If you don't specify a region value, then Aurora loads your file from Amazon S3 in the same region as your DB cluster.
  • bucket-name – The name of the Amazon S3 bucket that contains the data to load. Object prefixes that identify a virtual folder path are supported.
  • file-name-or-prefix – The name of the Amazon S3 text file or XML file, or a prefix that identifies one or more text or XML files to load. You can also specify a manifest file that identifies one or more text files to load.

Upvotes: 7

Vikash Rathee
Vikash Rathee

Reputation: 2084

March 2019:

RDS console doesn't have the option to change role anymore. What worked for me is to add the role via CLI and then reboot the writer instance.

aws rds add-role-to-db-cluster --db-cluster-identifier my-cluster --role-arn arn:aws:iam::123456789012:role/AllowAuroraS3Role

Upvotes: 3

Willie Z
Willie Z

Reputation: 506

I had experienced multiple occasions this error could occur.

  1. The error was thrown after running 'LOAD' sql for a while (around 220s), which is a suspicious time-out case. Finally I found my RDS's Subnet Group only have one outbound excluding the one to S3. By adding the outbound rule can fix this issue.

  2. The error was thrown immediately (0.2s). I was successfully loading data from S3 before, but suddenly with a change on the S3 url, this error occurred again. I was using a wrong S3 URL. Because I wanted to use S3 prefix instead of file. check the 'Load' syntax to make your sql right.

Upvotes: 0

Rajesh Goel
Rajesh Goel

Reputation: 3383

For me, I was missing the step to add the created RDS role to my S3 bucket. Once I add it, it worked instantly.

Upvotes: 1

utdrmac
utdrmac

Reputation: 781

After all the suggestions above, as a final step, I had to add a VPC Endpoint to S3. After that, everything started working.

Upvotes: 3

Dan Carrington
Dan Carrington

Reputation: 514

I had the same issue. I tried adding AmazonS3FullAccess to the IAM role that my RDS instances were using...no joy.

After poking around, I went into the RDS console, to Clusters. Selected my Aurora cluster and clicked Manage IAM Roles. It gave me a drop-down, I selected the IAM role (same one that the individual instances were using).

Once I did that, all was well and data load was nice and fast.

So, there are (for us) 5 steps/components:

1) The S3 bucket and bucket policy to allow a user to upload the object

{
    "Version": "2012-10-17",
    "Id": "Policy1453918146601",
    "Statement": [
        {
            "Sid": "Stmt1453917898368",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account id>:<user/group/role>/<IAM User/Group/Role>"
            },
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::<bucket name>/*"
        }
    ]
}

The "Principal" would be whatever IAM user, group or role will be uploading the data files to the bucket so that the RDS instance can import the data.

2) The IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486490368000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket name>/*"
            ]
        }
    ]
}

This is pretty simple with the Policy Generator.

3) Create the IAM Role:

This role should be assigned the IAM policy above. You can probably do an inline policy, too, if you're not going to use this policy for other roles down the line, but I like the idea of having a defined policy that I can reference later if I have a need.

4) Configure a Parameter Group that your cluster/instances will use to set the aws_default_s3_role value to the ARN of the role from #3 above.

5) Configure the Aurora Cluster by going to Clusters, selecting your cluster, selecting Manage IAM Roles and setting the IAM Role for your DB Cluster

At least for me, these steps worked like a charm.

Hope that helps!

Upvotes: 39

aarbor
aarbor

Reputation: 1534

I reached out to Amazon Aurora team and they confirmed there are edge cases with some of the servers having this issue. They are rolling out a patch to fix the issue soon, but in the mean time manually applied the patch to my cluster.

Upvotes: 0

Ray
Ray

Reputation: 21905

You need to attach the AmazonS3ReadOnlyAccess or AmazonS3FullAccess policy to the role you set up in IAM. This step was not included in the setup guide.

Go to IAM -> Roles in the AWS console, select the role you are using, click 'attach policy', then scroll way down to the S3 policies and pick one.

Upvotes: 1

Related Questions