NoobieProgrammer
NoobieProgrammer

Reputation: 90

Spark With AWS S3 running with docker - Certificate does not match any of the subject alternative names

Running into a weird certificate issue that I've been debugging for days and took multiple stabs at this.

My application simply uploads a directory to an S3 bucket then pulls down that directory from that same S3 bucket into a spark dataframe.

I'm only using apache spark, hadoop-aws, aws-java-sdk-bundle

Spark version 3.1.1, Scala version 2.12, hadoop version 3.2.0, and aws java sdk version 1.11.901

When I try running my application with docker I'm able to upload the directory, but when I try to make an attempt to login and read the directory I run this stacktrace of exceptions (probably just a propagation from the first exception that occurs)

Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket-name/directory: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [.s3.amazonaws.com, s3.amazonaws.com]

Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]

Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507) at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437) at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384) at com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) at com.amazonaws.http.conn.$Proxy60.connect(Unknown Source) at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)

I find it odd because my colleague is also using the same credentials as me but he does not run into this issue at all?

Any ideas on why this might be happening?

Upvotes: 3

Views: 3883

Answers (1)

ciurlaro
ciurlaro

Reputation: 1014

How to read AWS S3 with Docker, Spark, Python


Docker image:

  • spark_version=3.3.0
  • hadoop_version=3
  • python_version=3.10.6

Pyspark code:

HADOOP_VERSION = '3.3.1'

packages = [
    f'org.apache.hadoop:hadoop-aws:{HADOOP_VERSION}',
    'com.google.guava:guava:31.1-jre',
    'org.apache.httpcomponents:httpcore:4.4.14', 
    'com.google.inject:guice:4.2.2', 
    'com.google.inject.extensions:guice-servlet:4.2.2'
]

conf = SparkConf().setAll([
    ('spark.jars.packages', ','.join(packages)),
    ('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider'),
    ('spark.hadoop.fs.s3a.access.key', credentials['AccessKeyId']),
    ('spark.hadoop.fs.s3a.secret.key', credentials['SecretAccessKey']),
    ('spark.hadoop.fs.s3a.session.token', credentials['SessionToken']),
    ('spark.hadoop.fs.s3a.path.style.access', True)
])

spark = SparkSession.builder.config(conf=conf).getOrCreate()

Thanks @FelipeGonzalez for the spark.hadoop.fs.s3a.path.style.access tip.

Upvotes: 4

Related Questions