Reputation: 17208
I want to select
<div class="article_text">some long text </div>
with Jsoup.
String url = "computerworld.bg/45781_sofiya_teh_park_tryabva_da_bade_zavarshen_do_kraya_na_2015_g";
Document document = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url);
Elements elements = document.select("div.article_text");
Then I want to iterate over elements and to get theri text. But the div is not selected. If I only try with div as css selector the correct text information is showing, but there are another not appropriate divs texts, so I have to use the class name.
Where is my wrong?
Upvotes: 0
Views: 1336
Reputation: 24423
The documentation obviously says it is ok.
Element masthead = doc.select("div.masthead").first(); // div with class=masthead
So, I think _
is causing the problem. Try with div[class=article_text]
as selector, and if that doesn't work then div[class^=article]
(class starts with article), but it could select more than you would want.
UPDATE
div.article_text
works on online Jsoup tester (http://try.jsoup.org/) with url from your code. Maybe the issue is how you are getting the document. This example uses Jsoup.connect()
Document doc = Jsoup.connect("http://www.computerworld.bg/45781_sofiya_teh_park_tryabva_da_bade_zavarshen_do_kraya_na_2015_g").get();
UPDATE 2
Turns out this particular url returns different content based on user agent (without user agent set, article_text
is not present on that div), so just set userAgent
to, for example, Mozilla
and it will work.
Jsoup.connect("http://www.computerworld.bg/45781_sofiya_teh_park_tryabva_da_bade_zavarshen_do_kraya_na_2015_g").userAgent("Mozilla").get();
Upvotes: 2