Reputation: 332
Suppose I have parsed a line of HTML that is the following...
<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>
This is just an example...but how would I go about stripping everything EXCEPT the following:
http://www.google.com
logo.png
Go to google!
Also, is it possible to search for wildcards?
Upvotes: 0
Views: 153
Reputation: 54984
Maybe like this:
doc = Nokogiri::HTML '<a href="http://www.google.com" class="blah"><img src="logo.png" border="0"></img><br><span class="red">Go to google!</span></a>'
doc.xpath('//*/@href|//*/@src|//*/text()').map(&:to_s)
Upvotes: 1
Reputation: 5914
If you could make use of some gems it will be a very simple job. I would recommend you Mechanize gem
. Reference: http://mechanize.rubyforge.org/Mechanize.html
Upvotes: 1