Reputation: 1547
I've got something like that in HTML coming from server:
<html ...>
<head ...>
....
<link href="http://mydomain.com/Digital_Cameras--~all" rel="canonical" />
<link href="http://mydomain.com/Digital_Cameras--~all/sec_~product_list/sb_~1/pp_~2" rel="next" />
...
</head>
<body>
...
</body>
</html>
If b holds the browser object navigated to the page I need to look through, I'm able to find rel="canonical"
with b.html.include?
statement, but how could I retrieve the entire line where this substring was found? And I also need the next (not empty) one.
Upvotes: 1
Views: 422
Reputation: 46846
You can use a css-locator (or xpath) to get link elements.
The following would return the html (which would be the line) for the link element that has the rel attribute value of "canonical":
b.element(:css => 'link[rel="canonical"]').html
#=> <link href="http://mydomain.com/Digital_Cameras--~all" rel="canonical" />
I am not sure what you mean by "I also need the next (not empty) one.". If you mean that you want the one with rel attribute value of "next", you can similarly do:
b.element(:css => 'link[rel="next"]').html
#=> <link href="http://mydomain.com/Digital_Cameras--~all/sec_~product_list/sb_~1/pp_~2" rel="next" />
Upvotes: 5
Reputation: 5283
You could use String#each_line
to iterate through each line in b.html
and check for rel=
:
b.goto('http://www.iana.org/domains/special')
b.html.each_line {|line| puts line if line.include? "rel="}
That should return all strings including rel=
(although it could return lines that you don't want, such as <a>
tags with rel
attributes).
Alternately, you could use nokogiri to parse the HTML:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.iana.org/domains/special"))
nodes = doc.css('link')
nodes.each { |node| puts node}
Upvotes: 0