AppleCEO
AppleCEO

Reputation: 73

Spark gets 'The AWS Access Key Id you provided does not exist in our records'

I have access and secret of s3, it works fine in Python3+boto3 and Java respectively, but when I use the access and secret in Spark shell or submit a xx.jar coded by Scala, it throws Exception below

Spark 2.4.5,Hadoop 2.7.2, Java8, Scala 2.11.x

in SPARK_HOME/jars

aws-java-sdk-1.12.183.jar 、aws-java-sdk-core-1.11.492.jar、aws-java-sdk-s3-1.11.447.jar

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>MYACCESS</AWSAccessKeyId><RequestId>MYREQUEST</RequestId><HostId>xxxxxxxxx</HostId></Error>
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

python demo(not pyspark)- works fine

import boto3
from botocore.client import Config
s3_cli = boto3.client('s3', 
    config=Config(signature_version='s3v4',
    s3={'addressing_style': 'path'}), use_ssl=False,
    endpoint_url='MY_ENDPOINT_URL',
    aws_secret_access_key='MY_SECRET',
    aws_access_key_id='MY_ACCESS')

with open('txtFromS3.txt', 'wb') as data:
  s3_cli.download_fileobj('MY_BUCKET', 'myTxtOnS3.txt', data) 

scala(in spark-shell or spark-submit --class xx xxx.jar): EXCEPTION

sc.hadoopConfiguration.set("fs.s3.endpoint", "MY_ENDPOINT")# s3a、
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", 'MY_ACCESS');
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", 'MY_SECRET');

val rdd = sc.textFile("s3://MY_BUCKET/myTxtOnS3.txt")
println(rdd.count())

have tried almost all the solutions I could find including fs.s3a.xxxfs.s3n.xxxx

fs.s3X.access.keyfs.s3X.secret.key, X is a、n、nothing

sc.textFile("s3X://MY_ACCESS:MY_SECRET/[email protected]")

Upvotes: 1

Views: 1515

Answers (1)

AppleCEO
AppleCEO

Reputation: 73

For those who come across this Exception,

Cause: My client node has spark 2.4.5 + Hadoop 2.7, but nodes on cluster are Hadoop 3.2.0, So the main cause is the Hadoop version not matched.

Event though I replace the hadoop-aws-2.7.2.jar with hadoop-aws-3.2.0.jar, still not woks. Exceptions are 40X: InvalidAccesssKey、Permission denied, etc.

Solution

  • upgrade Hadoop 2.7 on client node to Hadoop 3.2.0
  • aws-java-sdk-1.12.183.jar and hadoop-aws-3.2.0.jar
  • fs.s3a.endpointfs.s3a.access.keyfs.s3a.secret.key (not s3 and awsSecretAccessKey for me)

Upvotes: 1

Related Questions