Lostsoul
Lostsoul

Reputation: 26037

How to not get contents of child elements within HtmlUnit?

I have the following:

<th>
Q4/10
<br>
<span> Nov 30, 2010 </span>
</th>

and I'd like to get Q4/10 but not the date that follows. I'm not sure how to do it within HtmlUnit. I know I can split both elements by spaces and then take everything before the first space, but I'm looking for something based on the tags themselves.

Upvotes: 0

Views: 964

Answers (1)

Rodney Gitzel
Rodney Gitzel

Reputation: 2710

If you know that the text you want comes before any sub elements, you can just grab its first child, which will contain your text and some whitespace:

HtmlTableHeaderCell th = ...
System.err.println( th.getFirstChild().toString().trim() ) ;

The more general solution would be to loop through the children of th looking for text nodes, and ignoring sub elements.

Upvotes: 1

Related Questions