Reputation: 25

javax.net.ssl.SSLHandshakeException for some https url in nutch 1.13

I try crawling seed urls that are http/https but for few https urls i get below error FetcherThread INFO api.HttpRobotRulesParser (168) - Couldn't get robots.txt for https://corporate.douglas.de/investors/?lang=en: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

on other hand https://www.integrafin.co.uk/annual-reports/ is crawled perfectly fine

Upvotes: 0

Answers (2)

Jorge Luis

Reputation: 3253

You could try using a more recent version of Nutch, or compile directly from master, and then give a try to the http.tls.certificates.check setting, from (https://github.com/apache/nutch/pull/388). This will essentially allow you to skip the TLS/SSL verification.

Upvotes: 0

JosemyAB

Reputation: 407

I think you need to put the certificate of server https://corporate.douglas.de/investors/?lang=en in the "cacerts" file of the JVM that runs your code.

First, download the certificate using Chrome:

Then, click in "details" tab and then in button "Copy to file"

In the wizard, select the option "DER binary.... (.CER)"

Now, you can use the tool "portecle" (http://portecle.sourceforge.net/) to add the certificate to the cacert file in your JVM followin this steps http://portecle.sourceforge.net/import-trusted-cert.html

Hope works for you.

Upvotes: 0

javax.net.ssl.SSLHandshakeException for some https url in nutch 1.13

Answers (2)

Related Questions