Reputation: 73
I have access
and secret
of s3, it works fine in Python3+boto3 and Java respectively, but when I use the access
and secret
in Spark shell or submit a xx.jar
coded by Scala, it throws Exception below
Spark 2.4.5,Hadoop 2.7.2, Java8, Scala 2.11.x
in SPARK_HOME/jars
aws-java-sdk-1.12.183.jar 、aws-java-sdk-core-1.11.492.jar、aws-java-sdk-s3-1.11.447.jar
Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>MYACCESS</AWSAccessKeyId><RequestId>MYREQUEST</RequestId><HostId>xxxxxxxxx</HostId></Error>
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
python demo(not pyspark)- works fine
import boto3
from botocore.client import Config
s3_cli = boto3.client('s3',
config=Config(signature_version='s3v4',
s3={'addressing_style': 'path'}), use_ssl=False,
endpoint_url='MY_ENDPOINT_URL',
aws_secret_access_key='MY_SECRET',
aws_access_key_id='MY_ACCESS')
with open('txtFromS3.txt', 'wb') as data:
s3_cli.download_fileobj('MY_BUCKET', 'myTxtOnS3.txt', data)
scala(in spark-shell or spark-submit --class xx xxx.jar): EXCEPTION
sc.hadoopConfiguration.set("fs.s3.endpoint", "MY_ENDPOINT")# s3a、
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", 'MY_ACCESS');
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", 'MY_SECRET');
val rdd = sc.textFile("s3://MY_BUCKET/myTxtOnS3.txt")
println(rdd.count())
have tried almost all the solutions I could find including
fs.s3a.xxx
、fs.s3n.xxxx
fs.s3X.access.key
、fs.s3X.secret.key
, X is a、n、nothing
sc.textFile("s3X://MY_ACCESS:MY_SECRET/[email protected]")
Upvotes: 1
Views: 1515
Reputation: 73
For those who come across this Exception,
Cause:
My client node has spark 2.4.5
+ Hadoop 2.7
, but nodes on cluster are Hadoop 3.2.0
, So the main cause is the Hadoop version not matched.
Event though I replace the hadoop-aws-2.7.2.jar
with hadoop-aws-3.2.0.jar
, still not woks. Exceptions are 40X: InvalidAccesssKey、Permission denied, etc.
Solution
Hadoop 2.7
on client node to Hadoop 3.2.0
aws-java-sdk-1.12.183.jar
and hadoop-aws-3.2.0.jar
fs.s3a.endpoint
、fs.s3a.access.key
、fs.s3a.secret.key
(not s3 and awsSecretAccessKey for me)Upvotes: 1