Yassir S
Yassir S

Reputation: 1042

How to run Apache Spark with S3 (Minio) secured with self-signed certificate?

I installed Minio (I installed Minio in Kubernetes using helm) with TLS using a self-signed certificate. Previsouly I was able to run my spark job with Minio without TLS. Now it is not possible to conect to Minio (normal !)

Then, I created a truststore file from the tls certificate

keytool -import \
  -alias tls \
  -file tls.crt \
  -keystore truststore.jks \
  -storepass "$minioTruststorePass" \
  -noprompt

I create a Kubernetes secret with the content of the truststore and I use in the spark-defaults.conf the following option to let spark use the trustore:

spark.kubernetes.driver.secrets.minio-truststore-secret

Finally, I did all the following change in my spark-defaults.conf but same problem

spark.hadoop.fs.s3a.endpoint                                      https://smart-agriculture-minio:9000
spark.hadoop.fs.s3.awsAccessKeyId                                 <s3aAccessKey>
spark.hadoop.fs.s3.awsSecretAccessKey                             <s3aSecretKey>
spark.hadoop.fs.s3.impl                                           org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.access.key                                    <s3aAccessKey>
spark.hadoop.fs.s3a.secret.key                                    <s3aSecretKey>
spark.hadoop.fs.s3a.path.style.access                             true
spark.hadoop.fs.s3a.impl                                          org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.connection.ssl.enabled                        true
spark.driver.extraJavaOptions                                      -Djavax.net.ssl.trustStore=/opt/spark/conf/minio/truststore/truststore.jks -Djavax.net.ssl.trustStorePassword=<minioTruststorePass>
spark.executor.extraJavaOptions                                   -Djavax.net.ssl.trustStore=/opt/spark/conf/minio/truststore/truststore.jks -Djavax.net.ssl.trustStorePassword=<minioTruststorePass>

Have you ever faced this problem and do you have an idea to solve it ?

Thanks

Upvotes: 5

Views: 6938

Answers (3)

Hannah Ritter
Hannah Ritter

Reputation: 51

Quite late, but I got the Hadoop S3/AWS connector to work with a self-signed cert by importing it to the default Java truststore via:

keytool -import -trustcacerts -alias certalias \
-noprompt -file /path/to/cert.crt \
-keystore $JAVA_HOME/jre/lib/security/cacerts \
-storepass changeit

changeit is the default Java cacerts password.

Upvotes: 3

Hamza Mourad
Hamza Mourad

Reputation: 23

javaOptions: "-Dcom.amazonaws.sdk.disableCertChecking=true"

Worked for me for hive !

Upvotes: 1

Taras Tereshchenko
Taras Tereshchenko

Reputation: 1

spark use hadoop libs, which are using aws-sdk, so you should disable certs check.

com.amazonaws.sdk.disableCertChecking=true

as I have understood , you would like to get answer for k8s + spark operator just add for driver and executor this property to your yaml file

javaOptions: "-Dcom.amazonaws.sdk.disableCertChecking=true"

fyi: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#specifying-extra-java-options

https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/SDKGlobalConfiguration.java#L29-L34

Upvotes: 0

Related Questions