Reputation: 10312
I am trying to extract the content of a webpage A. Using groovy I've tried the following
......
String urlStr = "url-of-webpage-A"
String pageText = urlStr.toURL().text
//println pageText
.....
The above code retrieves the text of webPage A as long as it doesn't redirect to an other webpage B. If A redirects to B, the page content of webPage B is retrieved in the pageText variable. Is there a way to code and check if webPage A is redirecting to an other webpage (in groovy or java)?
PS: The above piece of code is not a part of server side logic. I am executing it on the client side within the scope of a desktop appilcation.
Upvotes: 4
Views: 4750
Reputation: 308149
In Java you can use URL.openConnection()
to get a HttpURLConnection
(you'll need to cast). On this you can call setInstanceFollowRedirects(false)
.
Then you can use getResponseCode()
and see if HTTP_MOVED_PERM
(301), HTTP_MOVED_TEMP
(302) or HTTP_SEE_OTHER
(303). They all indicate redirection.
If you need to know where you're being redirected to, then you can use getHeaderField("Location")
to get the location header.
Upvotes: 14
Reputation: 171164
In groovy, you could do what Joachim suggests by doing:
String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null
while( location ) {
new URL( location ).openConnection().with { con ->
// We'll do redirects ourselves
con.instanceFollowRedirects = false
// Get the response code, and the location to jump to (in case of a redirect)
location = con.getHeaderField( "Location" )
if( !wasRedirected && location ) {
wasRedirected = true
}
// Read the HTML and close the inputstream
pageContent = con.inputStream.withReader { it.text }
}
}
println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"
If you don't want to be redirected, and want the contents of the first page, you simply need to do:
String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
// We'll do redirects ourselves
con.instanceFollowRedirects = false
// Get the location to jump to (in case of a redirect)
location = con.getHeaderField( "Location" )
// Read the HTML and close the inputstream
con.inputStream.withReader { it.text }
}
if( location ) {
println "Page wanted to redirect to $location"
}
println "Content was:"
println pageContent
Upvotes: 4