paradoxic
paradoxic

Reputation: 95

Nokogiri html parsing question

I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.

This is the code I have thus far:

.....

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|

puts node.text

....

This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn't show anything. I've tried several variations of this with no luck. I assume that the the ".text" callout after node is wrong, but I'm not sure.

My apologies for how rough this code is, I'm doing my best to learn here.

Upvotes: 5

Views: 2641

Answers (1)

sepp2k
sepp2k

Reputation: 370102

You're correct, the problem is text. text returns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the "content" attribute instead.

doc.xpath("//meta[@name='Keywords']/@content").each do |attr|
  puts attr.value
end

Since you know that there will be only one meta-tag with the name "keywords", you don't actually need to loop through the results, but can take the first item directly like this:

puts doc.xpath("//meta[@name='Keywords']/@content").first.value

Note however, that this will cause an error if there is no meta-tag with the name "content", so the first option might be preferable.

Upvotes: 7

Related Questions