Reputation: 103
Having loaded a (X)HTML page, I'm trying to get the value of a meta tag's "content" attribute. For example, given:
<meta name="author" content="John Smith" />
I'd like to extract the value "John Smith".
I know how to do that using XPath and understand that CSS was meant primarily for element selection but Nokogiri supports defining custom CSS pseudoclasses which I thought could be used as follows:
class CSSext
def attr(nodeset, tag)
nodeset.first.attribute_nodes.find_all {|node| node.name == tag}
end
end
doc = Nokogiri::HTML(open(someurl))
doc.css("meta[name='name']:attr('content')", CSSext.new)
However, this returns the same result as
doc.css("meta[name='name']")
What gives? Nokogiri uses the same engine underneath for both CSS and XPath searches so anything that's possible in XPath should be doable in CSS. How should I go about extracting the attribute value?
Upvotes: 7
Views: 4776
Reputation: 27793
Why not just?
doc.at("meta[name='author']")['content']
As far as I understand, pseudoclasses can be used to filter the nodeset only, but not to replace the nodeset with some other value such as the value of one of the nodes's attribute.
Upvotes: 7