Reputation: 155
I’m using Spark 2.1.1 with Hadoop 2.7.3 and I’m consuming data from different S3 locations in one pipeline.
I’m setting s3a
credentials using spark.sparkContext.hadoopConfiguration.set(“fs.s3a.access.key”, $KEY)
and doing the same for secret.
It works well, when I’m consuming different S3 buckets, but when I have different credentials to the same bucket (folder level permissions in one S3 bucket) only first pair of credentials is processed.
When I’m trying to access files using second pair it seems that spark config is not updated and it fails because of 403 error when calling S3.
What I want to achieve is to process files from the same S3 bucket using different credentials in one batch.
Upvotes: 1
Views: 1197
Reputation: 1
I found this link on configuring default credentials for all bucket and another credentials for one specific bucket but it's not taking default credentials for all other buckets. It looks sparkcontext not taking the configuration mentioned as below link. https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_different_S3_buckets_with_Per-Bucket_Configuration
Appreciate your experience if you have aware of this and implemented.
Upvotes: 0
Reputation: 13480
No real support for this. Each S3A connector instance only has one set of credentials, and the first S3A filesystem instance for a specific bucket is cached in the filesystem cache by its URI. Next time an instance of that filesystem URI is looked for, the existing ones with its credentials are picked up.
Upvotes: 1