Spark using multiple credentials for the same S3 bucket

Question

I’m using Spark 2.1.1 with Hadoop 2.7.3 and I’m consuming data from different S3 locations in one pipeline.

I’m setting s3a credentials using spark.sparkContext.hadoopConfiguration.set(“fs.s3a.access.key”, $KEY) and doing the same for secret.

It works well, when I’m consuming different S3 buckets, but when I have different credentials to the same bucket (folder level permissions in one S3 bucket) only first pair of credentials is processed.

When I’m trying to access files using second pair it seems that spark config is not updated and it fails because of 403 error when calling S3.

What I want to achieve is to process files from the same S3 bucket using different credentials in one batch.

stevel · Accepted Answer

No real support for this. Each S3A connector instance only has one set of credentials, and the first S3A filesystem instance for a specific bucket is cached in the filesystem cache by its URI. Next time an instance of that filesystem URI is looked for, the existing ones with its credentials are picked up.

Spark using multiple credentials for the same S3 bucket

Answers (2)

Related Questions