Reputation: 339
Is there a way to connect to Elasticsearch (https://IP:9200) by PySpark without certificate verification?
I didn't see any settings for this: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html
Curl works fine:
curl -iv --insecure --user admin:password --noproxy '*' https://xx.xxx.xx.xxx:9200
* Trying xx.xxx.xx.xxx:9200...
* Connected to xx.xxx.xx.xxx (xx.xxx.xx.xxx) port 9200 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /opt/conda/ssl/cacert.pem
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: OU=dev-1; CN=dev-1-es-http.es-dev.es.local
* start date: Jul 9 00:47:04 2020 GMT
* expire date: Jul 9 00:57:04 2021 GMT
* issuer: OU=dev-1; CN=dev-1-http
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Server auth using Basic with user 'admin'
> GET / HTTP/1.1
> Host: xx.xxx.xx.xxx:9200
> Authorization: Basic abc==
> User-Agent: curl/7.71.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
content-type: application/json; charset=UTF-8
< content-length: 544
content-length: 544
<
{
"name" : "dev-1-es-hot-2",
"cluster_name" : "dev-1",
"cluster_uuid" : "444",
"version" : {
"number" : "7.8.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "123",
"build_date" : "2020-06-14T19:35:50.234439Z",
"build_snapshot" : false,
"lucene_version" : "8.5.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
* Connection #0 to host xx.xxx.xx.xxx left intact
My last try:
reader = spark.read.format("org.elasticsearch.spark.sql") \
.option("es.port", "9200") \
.option("spark.es.net.http.auth.user", "admin") \
.option("spark.es.net.http.auth.pass", "password") \
.option("es.net.ssl", "true") \
.option("es.net.ssl.cert.allow.self.signed", "true") \
.option("es.net.proxy.https.use.system.props", "false") \
.option("es.nodes", "xx.xxx.xxx.xxx")
Ends with:
: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:348)
at org.elasticsearch.spark.sql.ElasticsearchRelation.cfg$lzycompute(DefaultSource.scala:225)
at org.elasticsearch.spark.sql.ElasticsearchRelation.cfg(DefaultSource.scala:223)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation.$anonfun$schema$1(DefaultSource.scala:233)
at scala.Option.getOrElse(Option.scala:189)
at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:233)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:448)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
at jdk.internal.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:128)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:432)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:428)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:745)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:338)
... 23 more
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:349)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:292)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:287)
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1356)
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1231)
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1174)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:182)
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:171)
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1408)
at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1314)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440)
at java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(SSLSocketImpl.java:819)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1189)
at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:828)
at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2116)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.doExecute(CommonsHttpTransport.java:685)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:664)
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:116)
... 30 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439)
at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306)
at java.base/sun.security.validator.Validator.validate(Validator.java:264)
at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:233)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:110)
at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory$TrustManagerDelegate.checkServerTrusted(SSLSocketFactory.java:76)
at java.base/sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:1510)
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1340)
... 54 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297)
at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:434)
... 62 more
I also trying to provide keystore/truestore without results.
Upvotes: 1
Views: 2281
Reputation: 126
I had the same issue as you, but in Java. As already stated in the comments below the question, the issue is that we need to use the server certificates in order to complete the TLS handshake.
I used the tool suggested by lubomand added the certificates to a .jks truststore, put a password then finally I added the truststore to all the worker servers of the spark cluster.
Then added the following properties
conf.put("spark.es.net.ssl.keystore.location", "<truststore path>");
conf.put("spark.es.net.ssl.keystore.pass","<truststore password>";
In this way I was able to communicate with the ElasticSearch Server.
P.S. If this answer is useful don't thank me, but @lubom which solved my issue in the comments thread below the question! I answered the question (after asking permission) just to clearly put how we both solved our common issue.
Upvotes: 2