MKay
MKay

Reputation: 836

Parsing HTML to get contents and it's tag

This may be an odd question to ask. But with detailed explanation, I might get a solution (at least a kick start point.)

I am working to automate the localization testing (L10N) with selenium & Java. As a part of one of possible approach,

Now As far as I know, Parsers if provided filters will give the corresponding plain text. But Is there any way that I can get the underlying HTML tag along with the text? Is it even possible with Jsoup or with any other parser?

e.g. I was looking to get <option> when I get Accounts as parsed text.

<html>

<body>
  <select>
    <option value="Savings">Accounts</option>
  </select>
</body>

</html>

Upvotes: 2

Views: 81

Answers (1)

snvrthn
snvrthn

Reputation: 385

Using Jsoup you can do this,

    Document doc = Jsoup.parse("<html><body<select><option value=\"Savings\">Accounts</option></select></body></html>");

    String contentText = "Accounts";

    Elements elems = doc.select(":containsOwn(" + contentText + ")");

    for(Element e: elems) {
        System.out.println("Html : " + e.outerHtml());
        System.out.println("Tag  : " + e.tagName());
    }

Ouptput

  Html : <option value="Savings">Accounts</option>
  Tag  : option

Upvotes: 1

Related Questions