Reputation: 796
I want to extract two tags from a website beside each others(adjacently), the first tag is a href and it should be extracted as the the absolute url . the second tag is a div tag and I should extract the data inside it.
I want the output to be as the following
100 USD http:\www.somesite..............
200 usd http:\www.thesite.............
Why? because later I will insert them into a table in a database .
I tried with the following code but I couldn't get the absolute url in addition I couldn't get rid of the tags while I want to extract the data only (without tags).
Document doc = Jsoup.connect("http://www.bezaat.com/ksa/jeddah/cars/all/1?so=77").get();
for (Element link : doc.select("div.rightFloat.price,a[abs:href].more-details"))
{
String absHref = url.attr("abs:href");
String attr = link.absUrl("href");
System.out.println(link);
}
If I try using System.out.println(link.text()) in my code I will miss the hyperlink completely !
Any help please?
Upvotes: 0
Views: 967
Reputation: 11712
I don't think that Jsoup css selector combinators (i.e. the comma in the selector) guarantees an ordering in the output. At least I would not count on it, even if you find the two elements in the ordering you expect. Instead of using the comma selector, I would first loop over the outer containers that hold the adjacent divs you are interested in. Within each div you can then access the price and link.
something like this. Note, that this is out of my head and untested!
Document doc = Jsoup.connect("http://www.bezaat.com/ksa/jeddah/cars/all/1?so=77").get();
for (Element adDiv : doc.select("div.category-listing-normal-ad")){
Element priceDiv = adDiv.select("div.rightFloat.price").first();
Element linkA = adDiv.select("a.more-details").first();
System.out.println(priceDiv.text() + " " + linkA.absUrl("href"));
}
Upvotes: 2