Reputation: 1123
I'm using JSoup to parse this HTML content:
<div class="submitted">
<strong><a title="View user profile." href="/user/1">user1</a></strong>
on 27/09/2011 - 15:17
<span class="via"><a href="/goto/002">www.google.com</a></span>
</div>
Which looks like this in web browser:
user1 on 27/09/2011 - 15:17 www.google.com
The username and the website can be parsed into variables using this:
String user = content.getElementsByClass("submitted").first().getElementsByTag("strong").first().text();
String website = content.getElementsByClass("submitted").first().getElementsByClass("via").first().text();
But I'm unsure of how to get the "on 27/09/2011 -15:17" into a variable, if I use
String date = content.getElementsByClass("submitted").first().text();
It also contains username and the website???
Upvotes: 2
Views: 1696
Reputation: 13653
Select the element before the text you wish to grab, then get its next sibling node (not element), which is a text node:
Document doc = Jsoup.parse("<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\"><a href=\"/goto/002\">www.google.com</a></span>" +
"</div> ");
String str = doc.select("strong").first().nextSibling().toString().trim();
System.out.println(str);
You can also ask an element for its child text nodes and index directly (though referencing the nodes by sibling is usually more robust than indexing):
Document doc = Jsoup.parse(
"<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\"><a href=\"/goto/002\">www.google.com</a></span>" +
"</div> ");
String str = doc.select("div").first().textNodes().get(1).text().trim();
System.out.println(str);
Upvotes: 0
Reputation: 43504
You can always remove the user
and the website
elements like this (you can clone your submitted
element if you do not want the remove actions to "damage" your document):
public static void main(String[] args) throws Exception {
Document content = Jsoup.parse(
"<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\"><a href=\"/goto/002\">www.google.com</a></span>" +
"</div> ");
// create a clone of the element so we do not destroy the original
Element submitted = content.getElementsByClass("submitted").first().clone();
// remove the elements that you do not need
submitted.getElementsByTag("strong").remove();
submitted.getElementsByClass("via").remove();
// print the result (demo)
System.out.println(submitted.text());
}
Outputs:
on 27/09/2011 - 15:17
Upvotes: 1
Reputation: 333
You can then parse string that you get.
String str[] = contentString.split(" ");
Then you can construct the string you want like this:
String str = str[1] + " " + str[2] + " - " + str[4];
This will extract you the string you need.
Upvotes: 0