A. Napster
A. Napster

Reputation: 203

How to get text from nested span using Jsoup?

I'm trying to get the text in the span

enter image description here

using this code below. However the output is behaving as if the nested spans don't exist

            Elements tags = document.select("div[id=tags]"); 

            for (Element tag:tags){


                Elements child_tags = tag.getElementsByTag("class");  

                String key = tag.html();
                System.out.println(key); //only as a test

                for (Element child_tag:child_tags){
                    System.out.println("\t" + child_tag.text());

                }

My output is

      <hr />Tags: 
      <span id="category"></span> 
      <span id="voteSelector" class="initially_hidden"> <br /> </span>      

Upvotes: 0

Views: 1640

Answers (2)

Rishal
Rishal

Reputation: 1538

Assuming you are trying the code on https://chesstempo.com/chess-problems/15 and the data you want is shown in the below image enter image description here

Now, Using Jsoup you will get the data whatever is rendered as a source code in the browser,for confirmation you can press CTRL+U in browser which will open up a new window where the actual contents which Jsoup will get will be displayed. Now coming to your questions the part which you are trying to retrieve itself is not present in the browser source code check that by pressing CTRL+U.

If the contents are rendered using JAVASCRIPT those will not be visible to JSOUP and hence you have to use something else which will run the javascript and provide you the details.

JSoup does not run Javascript and is not a browser.

EDIT

There is a turnaround using SELENIUM. Below is the working code to get the exact source code of the url and the required data which you are looking for:

import java.io.IOException;
import java.io.PrintWriter;

import org.json.simple.parser.ParseException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class JsoupDummy {
 public static void main(String[] args) throws IOException, ParseException {
    System.setProperty("webdriver.gecko.driver", "D:\\thirdPartyApis\\geckodriver-v0.19.1-win32\\geckodriver.exe");
    WebDriver driver = new FirefoxDriver();

    try {
        driver.get("https://chesstempo.com/chess-problems/15");
        Document doc = Jsoup.parse(driver.getPageSource());
        Elements elements = doc.select("span.ct-active-tag");
        for (Element element:elements){
             System.out.println(element.html());
        }

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        /*write.flush();
        write.close();*/
        driver.quit();

    }
}
}

You need selenium web driver Selenium Web Driver which simulates the browser behaviour and allows you to render the html content written by scripts as well.

Upvotes: 1

Eritrean
Eritrean

Reputation: 16498

Elements child_tags = tag.getElementsByTag("class");

With this line you are trying to get an element with tag class i.e <class>...</class>, which dose not exist. Change that line to:

Elements child_tags = tag.getElementsByClass("tag");

to get elements by attribute value of class = tag or to:

Elements child_tags = tag.getElementsByTag("span"); 

to get elements by tag name = span.

Upvotes: 1

Related Questions