Reputation: 648
Am parsing a web page with a standard structure as follows:
<html>
<body>
<table>
<tbody>
<tr class="active">
<td>name1</td>
<td>name2</td>
<td>name3</td>
</tr>
</tbody>
</table>
</body>
</html>
For the life of me, I can't access the 'tbody' or 'tr' elements.
response = open('http://my_url')
node = Nokogiri::HTML(response).css('table')
puts node
Returns
#<Nokogiri::XML::Element:0x8294c08c name="table" attributes=[#<Nokogiri::XML::Attr:0x8294c014 name="id" value="beta-users">] children=[#<Nokogiri::XML::Text:0x82953bc0 "\n">]>
I have tried various tricks but can't seem to dig deeper down to a lower-level child than 'table'.
At best, I can get to the lowest-level Text object by using
node.children
but
node.children.text
returns "\n".
Despite searching for some hours am none the wiser how to sort it out. Any thoughts?
Upvotes: 0
Views: 512
Reputation: 2901
There is a non-closed class value in your sample, it should be:
<html>
<body>
<table>
<tbody>
<tr class="active">
<td>name1</td>
<td>name2</td>
<td>name3</td>
</tr>
</tbody>
</table>
</body>
</html>
After correcting this, you can:
node = Nokogiri::HTML(response).css('table tbody tr td')
node.each {|child| puts child.text}
name1
name2
name3
Upvotes: 1