markbse
markbse

Reputation: 1123

How to get text between two Elements in DOM object?

I'm using JSoup to parse this HTML content:

<div class="submitted">
    <strong><a title="View user profile." href="/user/1">user1</a></strong> 
    on 27/09/2011 - 15:17 
    <span class="via"><a href="/goto/002">www.google.com</a></span>
</div> 

Which looks like this in web browser:

user1 on 27/09/2011 - 15:17 www.google.com

The username and the website can be parsed into variables using this:

String user    = content.getElementsByClass("submitted").first().getElementsByTag("strong").first().text(); 
String website = content.getElementsByClass("submitted").first().getElementsByClass("via").first().text();

But I'm unsure of how to get the "on 27/09/2011 -15:17" into a variable, if I use

String date = content.getElementsByClass("submitted").first().text();

It also contains username and the website???

Upvotes: 2

Views: 1696

Answers (3)

Jeffrey Bosboom
Jeffrey Bosboom

Reputation: 13653

Select the element before the text you wish to grab, then get its next sibling node (not element), which is a text node:

Document doc = Jsoup.parse("<div class=\"submitted\">" +
  "  <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
  "  on 27/09/2011 - 15:17 " +
  "  <span class=\"via\"><a href=\"/goto/002\">www.google.com</a></span>" +
  "</div> ");
String str = doc.select("strong").first().nextSibling().toString().trim();
System.out.println(str);

You can also ask an element for its child text nodes and index directly (though referencing the nodes by sibling is usually more robust than indexing):

Document doc = Jsoup.parse(
            "<div class=\"submitted\">" +
  "  <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
  "  on 27/09/2011 - 15:17 " +
  "  <span class=\"via\"><a href=\"/goto/002\">www.google.com</a></span>" +
  "</div> ");
String str = doc.select("div").first().textNodes().get(1).text().trim();
System.out.println(str);

Upvotes: 0

dacwe
dacwe

Reputation: 43504

You can always remove the user and the website elements like this (you can clone your submitted element if you do not want the remove actions to "damage" your document):

public static void main(String[] args) throws Exception {

    Document content = Jsoup.parse(
      "<div class=\"submitted\">" +
      "  <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
      "  on 27/09/2011 - 15:17 " + 
      "  <span class=\"via\"><a href=\"/goto/002\">www.google.com</a></span>" +
      "</div> ");

    // create a clone of the element so we do not destroy the original
    Element submitted = content.getElementsByClass("submitted").first().clone();

    // remove the elements that you do not need 
    submitted.getElementsByTag("strong").remove();
    submitted.getElementsByClass("via").remove();

    // print the result (demo)
    System.out.println(submitted.text());
}

Outputs:

on 27/09/2011 - 15:17

Upvotes: 1

shake
shake

Reputation: 333

You can then parse string that you get.

String str[] = contentString.split(" ");

Then you can construct the string you want like this:

String str = str[1] + " " + str[2] + " - " + str[4];

This will extract you the string you need.

Upvotes: 0

Related Questions