Reputation: 609
I want to use Jsoup to crawl content of from http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=550633c179112c8002bc6a0942d55b2a&artist=lucinda%20williams&track=lake%20charles
The codes are :
Document doc = Jsoup.connect("http://ws.audioscrobbler.com /2.0/?method=track.getInfo&api_key=550633c179112c8002bc6a0942d55b2a&artist=lucinda williams&track=lake charles")
.userAgent("Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) Gecko/20100101 Firefox/20.0")
.timeout(5000)
.get();
However, something wrong happens:
Exception in thread "main" java.net.SocketException: Unexpected end of file from server
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:770)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:767)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1162)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:429)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
at JsoupXML.main(JsoupXML.java:16)
But, when I use brwoser to visit the url, eveything is OK. Besides, when I use above codes to crawl content of http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=550633c179112c8002bc6a0942d55b2a&artist=cher&track=believe , everything is OK too.
Could you know the reason and any good ideas to solve it?
Thanks for your attention and sorry about my english.
Thanks for NeplatnyUdaj's kindly help, you give me wonderful hint. I forgot to replace whitspace and other special symbols with %20,%26 and so on.
Upvotes: 3
Views: 56147
Reputation: 1698
Well. The exception means that the remote server closed the connection unexpectedly.
The answer belows assumes that all those spaces visible in the question code URL are not actually there in your code.
There is really nothing much you can do except catch the exception and try again (or report an error to the user).
As for why the server closed the connection:
On a related note: Including the API-key in the question might not be optimal.
Upvotes: 3
Reputation: 9813
Change the user agent (or at least define it).
More details: Scraping a site
Upvotes: 1