Tomasz
Tomasz

Reputation: 155

Spark using multiple credentials for the same S3 bucket

I’m using Spark 2.1.1 with Hadoop 2.7.3 and I’m consuming data from different S3 locations in one pipeline.

I’m setting s3a credentials using spark.sparkContext.hadoopConfiguration.set(“fs.s3a.access.key”, $KEY) and doing the same for secret.

It works well, when I’m consuming different S3 buckets, but when I have different credentials to the same bucket (folder level permissions in one S3 bucket) only first pair of credentials is processed.

When I’m trying to access files using second pair it seems that spark config is not updated and it fails because of 403 error when calling S3.

What I want to achieve is to process files from the same S3 bucket using different credentials in one batch.

Upvotes: 1

Views: 1197

Answers (2)

Naresh Dulam
Naresh Dulam

Reputation: 1

I found this link on configuring default credentials for all bucket and another credentials for one specific bucket but it's not taking default credentials for all other buckets. It looks sparkcontext not taking the configuration mentioned as below link. https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_different_S3_buckets_with_Per-Bucket_Configuration

Appreciate your experience if you have aware of this and implemented.

Upvotes: 0

stevel
stevel

Reputation: 13480

No real support for this. Each S3A connector instance only has one set of credentials, and the first S3A filesystem instance for a specific bucket is cached in the filesystem cache by its URI. Next time an instance of that filesystem URI is looked for, the existing ones with its credentials are picked up.

Upvotes: 1

Related Questions