xPath Help - Selecting a specific word within a string

Question

I am looking to extract a part of a string using xPath.

Full string -

Informational (nonfiction), 1,303 words, Level S (Grade 3)

HTML code:


    Informational (nonfiction),
1,303 words,
Level S  (Grade 3)

I am looking to extract just the number of words from these strings, i.e. - 1,303 words in this case

The xPath of this string looks like

//*[@id="contentarea-inner"]/div[3]/div[2]/div

Webpage in question - https://www.readinga-z.com/books/leveled-books/book/?id=820

Please advise on how I can modify the xPath so as to extract only the number of words from the page. I have several thousand pages to get this info from

Thanks

Igor Savinkin · Accepted Answer

Basically you need both xpath and regex:

Get the text of the div node by xPath (see Shubham Jain's code)
Apply regex to the text; for example. this: \s[,\d]+(?= words). See the regex's work on the text node.

xPath Help - Selecting a specific word within a string

Answers (2)

Related Questions