Reputation: 72001
I'm using Apache HttpClient in a web crawler that is only for crawling public data.
I'd like it to be able to crawl sites with invalid certificates, no matter how invalid.
My crawler won't be passing in any usernames, passwords, etc and no sensitive data is being sent or received.
For this use case, I'd crawl the http
version of a site if it exists, but sometimes it doesn't of course.
How can this be done with Apache's HttpClient?
I tried a few suggestions like this one, but they still fail for some invalid certs, for example:
failed for url:https://dh480.badssl.com/, reason:java.lang.RuntimeException: Could not generate DH keypair
failed for url:https://null.badssl.com/, reason:Received fatal alert: handshake_failure
failed for url:https://rc4-md5.badssl.com/, reason:Received fatal alert: handshake_failure
failed for url:https://rc4.badssl.com/, reason:Received fatal alert: handshake_failure
failed for url:https://superfish.badssl.com/, reason:Connection reset
Note that I've tried this with my $JAVA_HOME/jre/lib/security/java.security
file's jdk.tls.disabledAlgorithms
set to nothing, to ensure this wasn't an issue, and I still get failures like the above.
Upvotes: 6
Views: 3126
Reputation: 1063
The short answer to your question, which is to specifically trust all certs, would be to use the TrustAllStrategy and do something like this:
SSLContextBuilder sslContextBuilder = new SSLContextBuilder();
sslContextBuilder.loadTrustMaterial(null, new TrustAllStrategy());
SSLConnectionSocketFactory socketFactory = new SSLConnectionSocketFactory(
sslContextBuilder.build());
CloseableHttpClient httpclient = HttpClients.custom().setSSLSocketFactory(
socketFactory).build();
However... an invalid cert may not be your main issue. A handshake_failure can occur for a number of reasons but in my experience it's usually due to a SSL/TLS version mismatch or cipher suite negotiation failure. This doesn't mean the ssl cert is "bad", it's just a mismatch between the server and client. You can see exactly where the handshake is failing using a tool like Wireshark (more on that)
While Wireshark can be great to see where it's failing, it won't help you come up with a solution. Whenever I've gone about debugging handshake_failures in the past I've found this tool particularly helpful: https://testssl.sh/
You can point that script at any of your failing websites to learn more about what protocols are available on that target and what your client needs to support in order to establish a successful handshake. It will also print information about the certificate.
For example (showing only two sections of the output of testssl.sh):
./testssl.sh www.google.com
....
Testing protocols (via sockets except TLS 1.2, SPDY+HTTP2)
SSLv2 not offered (OK)
SSLv3 not offered (OK)
TLS 1 offered
TLS 1.1 offered
TLS 1.2 offered (OK)
....
Server Certificate #1
Signature Algorithm SHA256 with RSA
Server key size RSA 2048 bits
Common Name (CN) "www.google.com"
subjectAltName (SAN) "www.google.com"
Issuer "Google Internet Authority G3" ("Google Trust Services" from "US")
Trust (hostname) Ok via SAN and CN (works w/o SNI)
Chain of trust "/etc/*.pem" cannot be found / not readable
Certificate Expiration expires < 60 days (58) (2018-10-30 06:14 --> 2019-01-22 06:14 -0700)
....
Testing all 102 locally available ciphers against the server, ordered by encryption strength
(Your /usr/bin/openssl cannot show DH/ECDH bits)
Hexcode Cipher Suite Name (OpenSSL) KeyExch. Encryption Bits
------------------------------------------------------------------------
xc030 ECDHE-RSA-AES256-GCM-SHA384 ECDH AESGCM 256
xc02c ECDHE-ECDSA-AES256-GCM-SHA384 ECDH AESGCM 256
xc014 ECDHE-RSA-AES256-SHA ECDH AES 256
xc00a ECDHE-ECDSA-AES256-SHA ECDH AES 256
x9d AES256-GCM-SHA384 RSA AESGCM 256
x35 AES256-SHA RSA AES 256
xc02f ECDHE-RSA-AES128-GCM-SHA256 ECDH AESGCM 128
xc02b ECDHE-ECDSA-AES128-GCM-SHA256 ECDH AESGCM 128
xc013 ECDHE-RSA-AES128-SHA ECDH AES 128
xc009 ECDHE-ECDSA-AES128-SHA ECDH AES 128
x9c AES128-GCM-SHA256 RSA AESGCM 128
x2f AES128-SHA RSA AES 128
x0a DES-CBC3-SHA RSA 3DES 168
So using this output we can see that if your client only supported SSLv3, the handshake would fail because that protocol isn't supported by the server. The protocol offering is unlikely the problem but you can double check what your java client supports by getting the list of enabled protocols. You can provide an overridden implementation of the SSLConnectionSocketFactory from above code snippet to get the list of enabled/supported protocols and cipher suites as follows (SSLSocket):
class MySSLConnectionSocketFactory extends SSLConnectionSocketFactory {
@Override
protected void prepareSocket(SSLSocket socket) throws IOException {
System.out.println("Supported Ciphers" + Arrays.toString(socket.getSupportedCipherSuites()));
System.out.println("Supported Protocols" + Arrays.toString(socket.getSupportedProtocols()));
System.out.println("Enabled Ciphers" + Arrays.toString(socket.getEnabledCipherSuites()));
System.out.println("Enabled Protocols" + Arrays.toString(socket.getEnabledProtocols()));
}
}
I often encounter handshake_failure when there is a cipher suite negotiation failure. To avoid this error, your client's list of supported cipher suites must contain at least one match to a cipher suite from the server's list of supported cipher suites.
If the server requires AES256 based cipher suites you probably need the java cryptographic extensions (JCE). These libraries are nation restricted so they may not be available to someone outside the US.
More on cryptography restrictions, if you're interested: https://crypto.stackexchange.com/questions/20524/why-there-are-limitations-on-using-encryption-with-keys-beyond-certain-length
Upvotes: 6
Reputation: 2101
You can do it with core jdk too, but iirc, httpclient also allows you to set the SSL Socket Factory too.
The factory defines and uses a ssl context that you construst with a trust manager. That manager would simply not verify the cert chain, as shown in above post.
You also need a hostnameverifier instance that would also choose to ignore the potential mismatch of cert hostname with the url's host (or ip). Otherwise, it would still fail even if the cert signer is blindly trusted.
I used to convert many client stack to 'accept self-signed' and it's quite easy in most stack. The worse cases is when the 3rd party lib doesn't allow choosing a ssl socket factory instance but only its clasname. In that case, I use a ThreadLocalSSLSocketFactory which doesn't own any actual factory but simply looks up the threadlocal to find one that the upper stackframes (that you can control) would have prepared. This only works if the 3rd party lib is not doing the work on distinct thread of course. I know http client can be told to use a specific ssl socket factory so this is easy.
Also take the time to read the JSSE doc, it is totally worth the time it takes to read.
Upvotes: 0
Reputation: 27538
I think @nmorenor answer is pretty close to the mark. What I would have done in addition is explicitly enabling SSLv3
(HttpClient automatically disables it by default due to security concerns) and disabling host name verification.
SSLContext sslContext = SSLContexts.custom()
.loadTrustMaterial((chain, authType) -> true)
.build();
CloseableHttpClient client = HttpClients.custom()
.setSSLSocketFactory(new SSLConnectionSocketFactory(sslContext,
new String[]{"SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"},
null,
NoopHostnameVerifier.INSTANCE))
.build();
Upvotes: 0
Reputation: 4604
If you are fine to use other open source libraries like netty
then worth trying below:
SslProvider provider = SslProvider.JDK; // If you are not concerned about http2 / http1.1 then JDK provider will be enough
SSLContext sslCtx = SslContextBuilder.forClient()
.sslProvider(provider)
.trustManager(InsecureTrustManagerFactory.INSTANCE) // This will trust all certs
... // Any other required parameters used for ssl context.e.g. protocols , ciphers etc.
.build();
I have used below version of netty for trusting any certificates with above code:
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.29.Final</version>
</dependency>
Upvotes: 0
Reputation: 177
I think that the post you are referring is very close to what it needs to be done. Have you tried something like:
HttpClientBuilder clientBuilder = HttpClientBuilder.create();
SSLContextBuilder sslContextBuilder = SSLContextBuilder.create();
sslContextBuilder.setSecureRandom(new java.security.SecureRandom());
try {
sslContextBuilder.loadTrustMaterial(new TrustStrategy() {
@Override
public boolean isTrusted(X509Certificate[] arg0, String arg1) throws CertificateException {
return true;
}
});
clientBuilder.setSSLContext(sslContextBuilder.build());
} catch (Throwable t) {
Logger.getLogger(getClass().getName()).log(Level.SEVERE, "Can't set ssl context", t);
}
CloseableHttpClient apacheHttpClient = clientBuilder.build();
I have not tried this code but hopefully it could work.
Cheers
Upvotes: 0