dchamb
dchamb

Reputation: 179

jSoup getting value of HTML tag

I am reading an html file from the internet and when I read the file, the output to my console is as follows:

<string>
       <String1>
        text
       </String1>
       <level2>
        text2
       </level2>
       <level3>
        text3
       </level3>
       <level4>
        text4
       </level4>
       <level5>
         TEXT
       </level5>
</string>
<string>
           <String2>
            text
           </String2>
           <level2>
            text2
           </level2>
           <level3>
            text3
           </level3>
           <level4>
            text4
           </level4>
           <level5>
             THIS TEXT
           </level5>
    </string>

How can I access the level5 text in the second string? I have been trying all day with no luck and would really appreciate some input from someone who knows more about this.

Here is my code:

String line = null;

            try {
                // FileReader reads text files in the default encoding.
                FileReader fileReader = new FileReader(String.valueOf(doc));

                // Always wrap FileReader in BufferedReader.
                BufferedReader bufferedReader = new BufferedReader(fileReader);

                while ((line = bufferedReader.readLine()) != null) {
                    Elements tdElements = doc.getElementsByTag("level1");
                    for(Element element : tdElements )
                    {
                        //Print the value of the element
                        System.out.println(element.text());
                    }

                }

                // Always close files.
                bufferedReader.close();
            } catch (FileNotFoundException ex) {
                System.out.println(
                        "Unable to open file '" +
                                doc + "'");
            } catch (IOException ex) {
                System.out.println(
                        "Error reading file '"
                                + doc + "'");
                // Or we could just do this:
                // ex.printStackTrace();
            }
        }
//
        catch (IOException e) {
            e.printStackTrace();
        }

Upvotes: 3

Views: 453

Answers (3)

Stephan
Stephan

Reputation: 43013

You can use a CSS selector here:

string:nth-of-type(2) > level5

DEMO: http://try.jsoup.org/~8w_pfCxDhJwIseTKiKsQjQJOBRs

DESCRIPTION

string:nth-of-type(2) /* Select the 2nd string node in document... */
> level5                /* ... then select all "level5" child nodes  */

SAMPLE CODE

Document doc = ...
Element level5Node = doc.select("string:nth-of-type(2) > level5").first();
if (level5Node ==null) {
   throw new RuntimeException("Unable to locate level5 text...");
}

System.out.println(level5Node.text()); // THIS TEXT

Upvotes: 1

Solution 1: you html is valid XML: use XML tools:

you can get your second level5 with XPath: "//string[2]/level5"

Solution 2: parse it with Jsoup and get the document then use Xpath as solution 1

See Jsoup with XPath / XSoup: Does jsoup support xpath?

Solution 1:

String xml="<root>"+your xml+"</root>";

DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression="//string[2]/level5";
String value = xPath.evaluate(expression, document);
System.out.println("EVALUATE:"+value);

Upvotes: 0

Gareth1305
Gareth1305

Reputation: 106

The code below uses JSoup to parse the text you were referring to. The variable 'textToParse' is the above html code that you provided. You can use JSoup's Psuedo selectors to find elements in a specific position in the DOM tree. Hope this is what you were looking for.

Document document = Jsoup.parse(textToParse);
Elements stringTags = document.select("string:eq(1)");
for(Element e : stringTags) {
    System.out.println(e.select("level5").text());
}

//Output: THIS TEXT

Upvotes: 1

Related Questions