Reputation: 1423
I have a program that scrapes links off of a webpage and then tests to see if the link is working or is broken. One bit that I'm having a bit of trouble with is making sure that the URL is actually valid.
The links in question are just to make sure the site works correctly from an end user point of view. So mostly http, https and mailto protocols, I'm not actually sure if there are any other protocols we use like ftp but I'd like to be able to handle all unexpected cases.
So far here is my code for constructing the URI. Before this I have already scraped the links from other pages:
private boolean isValidURI(String checkUrl){
boolean validURI = false;
checkUrl = "this could be a link for some reason.com"; //set to link you want to test
//Decodes checkUrl - Some links may already be encoded. This sets everything to a default of non-encoded urls.
try {
checkUrl = URLDecoder.decode(checkUrl, "UTF-8");
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
System.out.println("Error 1: "+checkUrl);
}
//Encodes checkUrl, allows URLs with various characters.
try {
url = new URL(checkUrl);
} catch (MalformedURLException e2) {
e2.printStackTrace();
System.out.println("Error 2: "+checkUrl);
}
try {
uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
System.out.println(uri);
validURI = true;
} catch (URISyntaxException e3) {
e3.printStackTrace();
System.out.println("Error 3: "+checkUrl);
}
return validURI;
}
What I'm struggling with here is if I put a link in without a valid protocol e.g. "this is the link.com" I get
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at xboxtools.PingUrl.isValidURI(PingUrl.java:106)
at xboxtools.PingUrl.setLinkStatus(PingUrl.java:47)
at xboxtools.PingUrl.<init>(PingUrl.java:28)
at xboxtools.LocaleTab.runLocaleActionPerformed(LocaleTab.java:179)
at xboxtools.LocaleTab$1$1.run(LocaleTab.java:71)
at java.lang.Thread.run(Unknown Source)
Exception in thread "Thread-2" java.lang.NullPointerException
at xboxtools.PingUrl.isValidURI(PingUrl.java:113)
at xboxtools.PingUrl.setLinkStatus(PingUrl.java:47)
at xboxtools.PingUrl.<init>(PingUrl.java:28)
at xboxtools.LocaleTab.runLocaleActionPerformed(LocaleTab.java:179)
at xboxtools.LocaleTab$1$1.run(LocaleTab.java:71)
at java.lang.Thread.run(Unknown Source)
Basically what I want to do is to test if the link I scrape is a valid link. If it's not, set validURI to false, then continue onto the next link.
Any help in suggestions of what I could be doing to improve upon this?
Upvotes: 2
Views: 1154
Reputation: 53694
you get a NPE because you catch an exception (MalformedURLException) and then proceed with more code as if nothing happened.
your question has nothing to do with url validation, just simple debugging. when encountering situations you don't understand, you should first try stepping through your code using a decent debugger.
Upvotes: 3