Reputation: 11
I'm a bit of a newbie and I'm trying to scrape some data from a table, but am not having much luck using xpath. I can get the first field I need, but then... nothing.
The table structure for each row is as follows:
<tr bgcolor="#FFF7E7">
<td valign="Top"><font color="#8C4510">
<span id="DataGrid1__ctl3_Label2">Index</span>
</font></td>
<td><font color="#8C4510"><a href="javascript:__doPostBack('DataGrid1$_ctl3$_ctl0','')"><font color="#8C4510">Title</font></a></font></td>
<td><font color="#8C4510"><a href="javascript:__doPostBack('DataGrid1$_ctl3$_ctl2','')"><font color="#8C4510">People</font></a></font></td>
<td valign="Top"><font color="#8C4510">Date</font></td><td><font color="#8C4510"><a href="javascript:__doPostBack('DataGrid1$_ctl3$_ctl4','')">
<font color="#8C4510">Text</font></a></font></td>
<td><font color="#8C4510"><a href="javascript:__doPostBack('DataGrid1$_ctl3$_ctl6','')"><font color="#8C4510">Outcome</font></a></font></td>
<td valign="Top">
<font color="#8C4510"><a href="javascript:__doPostBack('DataGrid1$_ctl3$_ctl8','')"><font color="#8C4510">Click link for more</font></a></font></td>
</tr>
I'm trying to extract the Index, Title, People, Text, Outcome fields as well as the link. I'm managing to extract the Index, but can't seem to get the rest.
In my ruby code, my call for actually getting the table seems to be working, but then my loop where I'm extracting the fields for each row of the table is not, apart from the Index.
Any help would be great.
Upvotes: 1
Views: 865
Reputation: 26763
With the excerpt you gave there, you can extract text and links with the following XPath query:
require 'rubygems'
require 'nokogiri'
f = File.open('test.html')
doc = Nokogiri::HTML(f)
doc.xpath("//tr//td//a").each do |node|
puts "#{node.text().strip()}: #{node.attribute('href')}"
end
f.close
However, not seeing the other rows in the table, not sure whether this is of any help for the rest.
Upvotes: 2