Reputation: 622
Well I have been asked to monitor few about 10-20 sites for any changes on them.
I have been trying to get header information of these sites and check their last update time using this method.
url = new URL( "http://www.wikipedia.org/" );
HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection();
System.out.println( "Connection established" );
httpConnection.setRequestMethod( "HEAD" );
httpConnection.connect();
long lastModified = httpConnection.getLastModified();
if( lastModified != 0 ) {
System.out.println( new Date( lastModified ) );
} else {
System.out.println( "Last-Modified not returned" );
}
httpConnection.disconnect();`
But problem with this method is that some sites (many) do not put complete header information. I would also like to know is this the right way to make a head request to the server or am I missing something ???
Is there any other way to monitor a site?
I have been converting whole of the site to a md5 value and then monitoring the sites but this method is too sensitive and notifies me for even the smallest of changes.
Upvotes: 3
Views: 133
Reputation: 951
if the server doesn't provide an accurate last-modified header, it is up to you to calculate when the site has changed. you will have to constantly retrieve the web page over some interval of time and check for changes yourself. the md5 sum is indeed sensitive to even the smallest of changes - perhaps you can figure out an alternative that is less sensitive to changes. maybe the website uses an HTML table to show bid postings, and you can count the rows of the table.
maybe you can do something involving the "click here to be notified about new bid postings" on the top right? :)
note: if you're taking the md5 sum of the complete web response, you could be md5'ing header data as well which is very likely to change. perhaps if you take the md5 sum of the html without the header data, you can accurately monitor when the page changes. just a suggestion - i don't want to solve your task for you if you are being paid :)
additional note: i see that you have your own code to request the web page - i feel i must suggest that you use any of many existing java web crawler libraries. the code will likely become more reliable and much easier to work with.
Upvotes: 1