Reputation: 90
Running into a weird certificate issue that I've been debugging for days and took multiple stabs at this.
My application simply uploads a directory to an S3 bucket then pulls down that directory from that same S3 bucket into a spark dataframe.
I'm only using apache spark, hadoop-aws, aws-java-sdk-bundle
Spark version 3.1.1, Scala version 2.12, hadoop version 3.2.0, and aws java sdk version 1.11.901
When I try running my application with docker I'm able to upload the directory, but when I try to make an attempt to login and read the directory I run this stacktrace of exceptions (probably just a propagation from the first exception that occurs)
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket-name/directory: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [.s3.amazonaws.com, s3.amazonaws.com]
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <bucket-name.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507) at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437) at com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384) at com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) at com.amazonaws.http.conn.$Proxy60.connect(Unknown Source) at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
I find it odd because my colleague is also using the same credentials as me but he does not run into this issue at all?
Any ideas on why this might be happening?
Upvotes: 3
Views: 3883
Reputation: 1014
spark_version=3.3.0
hadoop_version=3
python_version=3.10.6
HADOOP_VERSION = '3.3.1'
packages = [
f'org.apache.hadoop:hadoop-aws:{HADOOP_VERSION}',
'com.google.guava:guava:31.1-jre',
'org.apache.httpcomponents:httpcore:4.4.14',
'com.google.inject:guice:4.2.2',
'com.google.inject.extensions:guice-servlet:4.2.2'
]
conf = SparkConf().setAll([
('spark.jars.packages', ','.join(packages)),
('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider'),
('spark.hadoop.fs.s3a.access.key', credentials['AccessKeyId']),
('spark.hadoop.fs.s3a.secret.key', credentials['SecretAccessKey']),
('spark.hadoop.fs.s3a.session.token', credentials['SessionToken']),
('spark.hadoop.fs.s3a.path.style.access', True)
])
spark = SparkSession.builder.config(conf=conf).getOrCreate()
Thanks @FelipeGonzalez for the spark.hadoop.fs.s3a.path.style.access
tip.
Upvotes: 4