How to define an xpath expression that only retrieves hyphenated elements from the first of two similar divs?

Question

The divs below appear in that order in the HTML I am parsing.

//div[contains(@class,'top-container')]//font/text()

I'm using the xpath expression above to try to get any data in the first div below in which a hyphen is used to delimit the data:

Wednesday - Chess at Higgins Stadium
Thursday - Cook-off

The problem is I am getting data from the second div below such as:

Monday 10:00 - 11:00
Tuesday 10:00 - 11:00

How do I only retrieve the data from the first div? (I also want to exclude any elements in the first div that do not contain this hyphenated data)?

 
 
Wednesday - Chess at Higgins Stadium 
Thursday - Cook-off 
  
  
 
 

 
 
 
Alex Dawkin  
Monday 10:00 - 11:00  
Tuesday 10:00 - 11:00

Mads Hansen · Accepted Answer

Your XPATH was matching on any font element that is a descendant of

.

div[1] will address the first div child element of the "top-container" element. If you add that to your XPATH, it will return the desired results.

//div[contains(concat(' ',@class,' '),' top-container '))]/div[1]//font/text()

If you want to ensure that only text() nodes that contain "-" are addressed, then you should also add a predicate filter to the text().

//div[contains(concat(' ',@class,' '),' top-container '))]/div[1]//font/text()[contains(.,'-')]

Instead of checking only for nodes that contain "-", how would you modify the last expression to just check for non-empty strings?

If you want to return any text() node with a value, then the predicate filter on text() is not necessary. If a text node doesn't have content, then it isn't a text node and won't be selected.

However, if you only want to select text() nodes that contain text other than whitespace, you could use this expression:

//div[contains(concat(' ',@class,' '),' top-container '))]/div[1]//font/text()[normalize-space()]

normalize-space() removes any leading and trailing whitespace characters. So, if the text() only contained whitespace(including ), the result would be nothing and evaluate to false() in the predicate filter, so only text() containing something other than whitespace will be selected.

How to define an xpath expression that only retrieves hyphenated elements from the first of two similar divs?

Answers (1)

Related Questions