Reputation: 4486
I have a rather basic question here which means i'm probably missing something i'm using Nokogiri to scrape a site.
I want to extract the text AFTER the end of a strong tag within a div which looks like this:
<p style="padding-bottom:0px;"><strong>Location:</strong> Cape Town</p>
Currently my code is as follows:
location = detail_page.css('p[style="padding-bottom:0px;"]').text
Which obviously gives the <strong>Location:</strong>
bit as well, is there a way to do this without using a regex?
The reason for asking is that there are other divs in the same format containing information which I need so I can't just delete the strong elements.
Thanks in advance
Marc
Upvotes: 0
Views: 733
Reputation: 118261
Here I would do as below :
require 'nokogiri'
@doc = Nokogiri::HTML.parse('<p style="padding-bottom:0px;"><strong>Location:</strong> Cape Town</p>')
@doc.at_css('p[style*="padding-bottom:0px;"] > text()').text.strip
# => Cape Town
Upvotes: 0
Reputation: 79723
You could use XPath:
detail_page.xpath('//p[@style="padding-bottom:0px;"]/strong/following-sibling::text()')
This selects any text nodes that are following siblings of strong
elements that are in turn children of p
elements with a style
attribute witht he value padding-bottom:0px;
.
Upvotes: 1