Reputation: 15
I am trying to find an element using xpath and get the elements text value. Kindly bear with me and help me in resolving the issue.
Visit Click here
Visit Click here
In <div class=“medium-8 columns”>
- I need to extract paragraphs text only up to “Further History” (ie. stop at “Further History”, not including “Further History”).
In <div class=“medium-8 columns”>
- Here I need to extract paragraphs text after “Further History” (not including “Further History”).
I am using below XPath expression which is returning anything.
(//STRONG[not(contains(text(), 'Further History'))]/following-sibling::text() | //STRONG[not(contains(text(), 'Further History'))]/../following-sibling::p/text()) | //div[contains(@class, 'articlecontent')]
Upvotes: 0
Views: 2333
Reputation: 22617
HTML might not be case-sensitive, but XML (and, consequently, XPath) is: "STRONG" is not the same as "strong", and in the HTML you linked to, there is only "strong".
A useful XPath expression to retrieve the text you are interested in might be
//div[@class="medium-8 columns"]/p[following-sibling::p/strong]/text()
which means
//div select all `div` elements, anywhere in the document
[@class="medium-8 columns"] but only if they have a `class` attribute whose value is
equal to "medium-8 columns"
/p of those `div` elements select all `p` child elements
[following-sibling::p/strong] but only if they have a following sibling `p` which has a
`strong` element as a child
/text() of the remaining `p` elements, select the text content
and which would return (individual results separated by ------
):
Tim Bajarin is recognized as one of the leading industry
consultants, analysts and futurists, covering the field of
personal computers and consumer technology. Mr. Bajarin has
been with Creative Strategies since 1981 and has served as a
consultant to most of the leading hardware and software
vendors in the industry including IBM, Apple, Xerox, Hewlett
Packard/Compaq, Dell, AT&T, Microsoft, Polaroid, Lotus,
Epson, Toshiba and numerous others.
-----------------------
His articles and/or analysis have appeared in USA Today, Wall
Street Journal, The New York Times, Time and Newsweek
magazines, BusinessWeek and most of the leading business and
trade publications. He has appeared as a business analyst
commenting on the computer industry on all of the major
television networks and was a frequent guest on PBS’ The
Computer Chronicles.
-----------------------
Mr. Bajarin has been a columnist for US computer industry
publications such as PC Week and Computer Reseller News and
wrote for ABCNEWS.COM for two years and Mobile Computing for
10 years. His columns currently appear in Asia Computer
Weekly, Personal Computer World (UK), and Microscope (UK) as
well as Mobile Enterprise Magazine. His various columns and
analyses are syndicated in over 30 countries.
For your second case:
Here I need to extract paragraphs text after “Further History” (not including “Further History”)
just replace following-sibling
with preceding-sibling
in the path expression.
Upvotes: 2