Reputation: 81
I tried to obtain text from those div tags but they all return nothing:
HTML :
<div id="comments" class="part comments last"><script type="text/rocketscript" data- rocketsrc="http://sabq.org/js/parts/comments/global.js?1324675506" data- rocketoptimized="true"></script>
<h2 class="header">التعليقات (23)</h2>
<div id="comment_5946146" class="item
">
<h4 class="direction1">
<span class="serial">1</span> *محمد * </h4>
<p class="direction1"><span class="date-time">17 جمادى الأولى 1435 | 12:46 AM</span></p>
<p class="like-buttons">
<span class="like " title="أعجبني"><span class="value">5</span></span>
<span class="sep">-</span>
<span class="unlike " title="لم يعجبني"><span class="value">0</span></span>
<input type="hidden" name="class" value="Comment">
<input type="hidden" name="id" value="5946146">
</p>
<br clear="all">
<div class="message">هؤلاء أشخاص لم يجدوا سبيلاً لطلب الرزق إلا بهذه الطريقة فكفاكم تضييقاً وخناقاً عليهم حتى في مصادر رزقهم ....</div>
</div>
I wanna get the div class "message" and the text inside h4 tag and span "date-time" I tried to:
document.select("div.message");
And:
document.select("div.comments").select("div.message");
But they didn't work.
Upvotes: 1
Views: 4514
Reputation: 8617
Tried against your html with 5 cases and they all work fine, just a little note - you are retrieving a collection of Elements
and not just a single Element
by using document.select("div.message");
Code:
Document document = Jsoup.parse(new File("some.html"),"utf-8");
Element message = document.select("div.message").first();
Element span = document.select("span.date-time").first();
Element comments = document.select("div.comments").first();
Element h4 = document.select("h4.direction1").first();
Element test = document.select("div.comments").select("div.message")
.first();
System.out.println(message.text());
System.out.println(span.text());
System.out.println(comments.text());
System.out.println(h4.text());
System.out.println(test.text());
Gives;
م حتى في مصادر رزقهم ....
17 جمادى الأولى 1435 | 12:46 AM
التعليقات (23) 1 *محمد * 17 جمادى الأولى 1435 | 12:46 AM 5 - 0 هؤلاء أشخاص لم يجدوا سبيلاً لطلب الرزق إلا بهذه الطريقة فكفاكم تضييقاً وخناقاً عليهم حتى في مصادر رزقهم ....
1 *محمد *
هؤلاء أشخاص لم يجدوا سبيلاً لطلب الرزق إلا بهذه الطريقة فكفاكم تضييقاً وخناقاً عليهم حتى في مصادر رزقهم ....
PS: I have used .first()
to prove the validity of the used selectors, since your html had unique non-repetitive combinations. In case you have multiple results per selector, you can iterate over the collection and get the individual results, like:
Elements message = document.select("div.message");
for (Element el : message)
System.out.println(el.text());
EDIT:
To parse from the url:
Change,
Document document = Jsoup.parse(new File("some.html"),"utf-8");
To,
Document document = Jsoup.connect("http://sabq.org/WzUfde").userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21").get();
This works for me, cannot post the huge output here but you can test it for your case.
Upvotes: 1