Reputation: 103
I am trying to get "cell4" value that is written in a HTML table like the following using ruby xpath and nokogiri:
<html>
<body>
<h1>Heading</h1>
<p>paragraph.</p>
<h4>Two rows and three columns:</h4>
<table border="0">
<tr>
<td>cell1</td>
<td>cell2</td>
</tr>
<tr>
<td>cell3</td>
<td>cell4</td>
</tr>
</table>
</body>
</html>
I have the following simple code but it brings []. This question must be simple enough but couldnt find anything that hits the spot on the google
#!/usr/bin/ruby -w
require 'rubygems'
require 'nokogiri'
page1 = Nokogiri::HTML('test_simple.html')
a = page1.xpath("//html/body/table/tr[2]/td[2]")
p a
the xpath works as intended on REXML therefore it is correct, but does not on nokogiri. Since this is going to be used for larger htmls REXML cannot be used. The problem does not seem to be only with the tables only other tag contents
or cannot be scraped as well.
Upvotes: 3
Views: 4156
Reputation: 103
thanks to taro`s comment, I was able to solve the issue with some little effort
Here goes the correct code:
#!/usr/bin/ruby -w
require 'rubygems'
require 'nokogiri'
page1 = Nokogiri::HTML(open('test_simple.html'))
a = page1.xpath("/html/body/table/tr[2]/td[2]").text
p a
Upvotes: 4
Reputation: 17629
IMHO it is a lot asier to work with the CSS API in Nokogiri (XPath is always giving me headaches):
page.css('td') # should return an array of 4 table cell nodes
page.css('td')[3] # return the 4th 'td' node, counting starts at 0
Upvotes: 7