Jonathan_W
Jonathan_W

Reputation: 648

Nokogiri and tables

Am parsing a web page with a standard structure as follows:

<html>
  <body>
     <table>
        <tbody>
           <tr class="active">
             <td>name1</td>
             <td>name2</td>
             <td>name3</td>
          </tr>
       </tbody>
     </table>
  </body>
</html>

For the life of me, I can't access the 'tbody' or 'tr' elements.

response = open('http://my_url')
node = Nokogiri::HTML(response).css('table')
puts node

Returns

#<Nokogiri::XML::Element:0x8294c08c name="table" attributes=[#<Nokogiri::XML::Attr:0x8294c014 name="id" value="beta-users">] children=[#<Nokogiri::XML::Text:0x82953bc0 "\n">]>

I have tried various tricks but can't seem to dig deeper down to a lower-level child than 'table'.

At best, I can get to the lowest-level Text object by using

node.children

but

node.children.text 

returns "\n".

Despite searching for some hours am none the wiser how to sort it out. Any thoughts?

Upvotes: 0

Views: 512

Answers (1)

Grych
Grych

Reputation: 2901

There is a non-closed class value in your sample, it should be:

<html>
  <body>
     <table>
        <tbody>
           <tr class="active">
             <td>name1</td>
             <td>name2</td>
             <td>name3</td>
          </tr>
       </tbody>
     </table>
  </body>
</html>

After correcting this, you can:

node = Nokogiri::HTML(response).css('table tbody tr td')
node.each {|child| puts child.text}
name1
name2
name3

Upvotes: 1

Related Questions