ruseel
ruseel

Reputation: 1734

spark 2.3.0, aws-sdk-java 1.7.4 - s3a read failed with AmazonS3Exception Bad Request?

While using spark 2.3.0, hadoop-aws 2.7.6 I tried to read from s3

spark.sparkContext.textFile("s3a://ap-northeast-2-bucket/file-1").take(10)

But AmazonS3Exception raised.

com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 202ABEDF0E955321, AWS Error Code: null, AWS Error Message: Bad Request
  at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
  at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
  ...

I launched ec2 instance with Instance Profile, so AWS SDK are using InstaneProfileCredential, and in console I can use AWS CLI successfuly

aws s3 ls ap-northeast-2-bucket 
aws s3 cp s3://ap-northeast-2-bucket/file-a file-a 

I did set fs.s3a.endpoint to s3.ap-northeast-2.amazonaws.com in spark-defaults.conf

# spark-defaults.conf
spark.hadoop.fs.s3a.endpoint    s3.ap-northeast-2.amazonaws.com

Upvotes: 1

Views: 1506

Answers (1)

ruseel
ruseel

Reputation: 1734

This was caused by combination of many facts.

I was using spark 2.3.0 with hadoop 2.7. So I was using hadoop-aws 2.7.6 and then by dependency aws-java-sdk version is 1.7.4.

My bucket is located in Seoul (ap-northeast-2) and (Seoul and Frankfurt) region only support V4 signing mechanism. So I should set endpoint for aws-sdk to use V4 properly. This can be fixed by setting hadoop conf

spark.hadoop.fs.s3a.endpoint    s3.ap-northeast-2.amazonaws.com

And aws-java-sdk released before June 2016 is using V2 signing mechanism as default. So I should explicitly set aws-sdk to use V4. This can be fixed by setting java system property.

import com.amazonaws.SDKGlobalConfiguration
System.setProperty(SDKGlobalConfiguration.ENABLE_S3_SIGV4_SYSTEM_PROPERTY, "true")

If both fix is not applied, BadRequest error occurs.

Upvotes: 1

Related Questions