user1015523
user1015523

Reputation: 332

Parsing just the content in HTML nodes via Nokogiri in Ruby

Suppose I have parsed a line of HTML that is the following...

<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>

This is just an example...but how would I go about stripping everything EXCEPT the following:

http://www.google.com
logo.png
Go to google!

Also, is it possible to search for wildcards?

Upvotes: 0

Views: 153

Answers (2)

pguardiario
pguardiario

Reputation: 54984

Maybe like this:

doc = Nokogiri::HTML '<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>'
doc.xpath('//*/@href|//*/@src|//*/text()').map(&:to_s)

Upvotes: 1

nkm
nkm

Reputation: 5914

If you could make use of some gems it will be a very simple job. I would recommend you Mechanize gem. Reference: http://mechanize.rubyforge.org/Mechanize.html

Upvotes: 1

Related Questions