user1955506
user1955506

Reputation: 103

Trying to extract attribute values using Nokogiri with custom pseudoclass CSS selectors

Having loaded a (X)HTML page, I'm trying to get the value of a meta tag's "content" attribute. For example, given:

<meta name="author" content="John Smith" />

I'd like to extract the value "John Smith".

I know how to do that using XPath and understand that CSS was meant primarily for element selection but Nokogiri supports defining custom CSS pseudoclasses which I thought could be used as follows:

class CSSext
  def attr(nodeset, tag)
    nodeset.first.attribute_nodes.find_all {|node| node.name == tag}
  end
end

doc = Nokogiri::HTML(open(someurl))
doc.css("meta[name='name']:attr('content')", CSSext.new)

However, this returns the same result as

doc.css("meta[name='name']")

What gives? Nokogiri uses the same engine underneath for both CSS and XPath searches so anything that's possible in XPath should be doable in CSS. How should I go about extracting the attribute value?

Upvotes: 7

Views: 4776

Answers (1)

akuhn
akuhn

Reputation: 27793

Why not just?

doc.at("meta[name='author']")['content']

As far as I understand, pseudoclasses can be used to filter the nodeset only, but not to replace the nodeset with some other value such as the value of one of the nodes's attribute.

Upvotes: 7

Related Questions