Reputation: 203
I'm trying to get the text in the span
using this code below. However the output is behaving as if the nested spans don't exist
Elements tags = document.select("div[id=tags]");
for (Element tag:tags){
Elements child_tags = tag.getElementsByTag("class");
String key = tag.html();
System.out.println(key); //only as a test
for (Element child_tag:child_tags){
System.out.println("\t" + child_tag.text());
}
My output is
<hr />Tags:
<span id="category"></span>
<span id="voteSelector" class="initially_hidden"> <br /> </span>
Upvotes: 0
Views: 1640
Reputation: 1538
Assuming you are trying the code on https://chesstempo.com/chess-problems/15 and the data you want is shown in the below image
Now, Using Jsoup you will get the data whatever is rendered as a source code in the browser,for confirmation you can press CTRL+U
in browser which will open up a new window where the actual contents which Jsoup will get will be displayed. Now coming to your questions the part which you are trying to retrieve itself is not present in the browser source code check that by pressing CTRL+U
.
If the contents are rendered using JAVASCRIPT those will not be visible to JSOUP and hence you have to use something else which will run the javascript and provide you the details.
JSoup does not run Javascript and is not a browser.
EDIT
There is a turnaround using SELENIUM. Below is the working code to get the exact source code of the url and the required data which you are looking for:
import java.io.IOException;
import java.io.PrintWriter;
import org.json.simple.parser.ParseException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
public class JsoupDummy {
public static void main(String[] args) throws IOException, ParseException {
System.setProperty("webdriver.gecko.driver", "D:\\thirdPartyApis\\geckodriver-v0.19.1-win32\\geckodriver.exe");
WebDriver driver = new FirefoxDriver();
try {
driver.get("https://chesstempo.com/chess-problems/15");
Document doc = Jsoup.parse(driver.getPageSource());
Elements elements = doc.select("span.ct-active-tag");
for (Element element:elements){
System.out.println(element.html());
}
} catch (Exception e) {
e.printStackTrace();
} finally {
/*write.flush();
write.close();*/
driver.quit();
}
}
}
You need selenium web driver Selenium Web Driver which simulates the browser behaviour and allows you to render the html content written by scripts as well.
Upvotes: 1
Reputation: 16498
Elements child_tags = tag.getElementsByTag("class");
With this line you are trying to get an element with tag class i.e <class>...</class>
, which dose not exist. Change that line to:
Elements child_tags = tag.getElementsByClass("tag");
to get elements by attribute value of class = tag or to:
Elements child_tags = tag.getElementsByTag("span");
to get elements by tag name = span.
Upvotes: 1