Select div -class tag using Jsoup with Java

Question

I want to select

some long text

with Jsoup.

String url = "computerworld.bg/45781_sofiya_teh_park_tryabva_da_bade_zavarshen_do_kraya_na_2015_g";
Document document = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url);
Elements elements = document.select("div.article_text");

Then I want to iterate over elements and to get theri text. But the div is not selected. If I only try with div as css selector the correct text information is showing, but there are another not appropriate divs texts, so I have to use the class name.

Where is my wrong?

Predrag Maric · Accepted Answer

The documentation obviously says it is ok.

Element masthead = doc.select("div.masthead").first(); // div with class=masthead

So, I think _ is causing the problem. Try with div[class=article_text] as selector, and if that doesn't work then div[class^=article] (class starts with article), but it could select more than you would want.

UPDATE

div.article_text works on online Jsoup tester (http://try.jsoup.org/) with url from your code. Maybe the issue is how you are getting the document. This example uses Jsoup.connect()

Document doc = Jsoup.connect("http://www.computerworld.bg/45781_sofiya_teh_park_tryabva_da_bade_zavarshen_do_kraya_na_2015_g").get();

UPDATE 2

Turns out this particular url returns different content based on user agent (without user agent set, article_text is not present on that div), so just set userAgent to, for example, Mozilla and it will work.

Jsoup.connect("http://www.computerworld.bg/45781_sofiya_teh_park_tryabva_da_bade_zavarshen_do_kraya_na_2015_g").userAgent("Mozilla").get();

Select div -class tag using Jsoup with Java

Answers (1)

Related Questions