Reputation: 4483
I have parsed the outlinks of a web page which I am going to parse again using Jsoup. But the problem is that, the links are of the form: ../../../pincode/india/andaman-and-nicobar- islands/
. In this form I cannot parse them. So I have converted to absolute url using link.attr("abs:href")
with the help of other post of stackoverflow.
Url of the first web page that I have parsed is: http://www.mapsofindia.com/pincode/india/
. And the absolute URls that I have got after parsing is of the form http://www.mapsofindia.com/../pincode/india/andaman-and-nicobar-islands/
. But I cannot parse them further using Jsoup. So when I am executing the following statement:
Jsoup.parse("http://www.mapsofindia.com/../pincode/india/andaman-and-nicobar-islands/");
It is giving HTTP 400 error i.e. bad request. So I think there is some problem with the Urls. So can anyone please help me to solve the above problem to get the urls in proper manner so that I can parse them further. Thank you.
Upvotes: 0
Views: 519
Reputation: 25350
please test these two things:
link.absUrl("href")
instead of link.attr("abs:href")
baseUri()
on your element or document)Btw. you better use connect()
Method for this thing:
Document doc = Jsoup.connect("http://<your url here>").get();
Upvotes: 1