Reputation: 95
I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.
This is the code I have thus far:
.....
doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|
puts node.text
....
This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn't show anything. I've tried several variations of this with no luck. I assume that the the ".text" callout after node is wrong, but I'm not sure.
My apologies for how rough this code is, I'm doing my best to learn here.
Upvotes: 5
Views: 2641
Reputation: 370102
You're correct, the problem is text
. text
returns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the "content" attribute instead.
doc.xpath("//meta[@name='Keywords']/@content").each do |attr|
puts attr.value
end
Since you know that there will be only one meta-tag with the name "keywords", you don't actually need to loop through the results, but can take the first item directly like this:
puts doc.xpath("//meta[@name='Keywords']/@content").first.value
Note however, that this will cause an error if there is no meta-tag with the name "content", so the first option might be preferable.
Upvotes: 7