TheGoodMax
TheGoodMax

Reputation: 1

Having problems while parsing a HTML with Nokogiri

I'm parsing a HTML file from a website, but I'm having problems to get all of the data from the file.

<tr>
<td class="color_line1" valign="center" align="left">Cemopel - Cm Petroleo Ltda.</td>
<td class="color_line1" valign="center" align="left">Avenida Rui Barbosa, 879 0</td>
<td class="color_line" valign="left"><a class="linkpadrao" href="javascript:Direciona('GRA%C3%87AS');">Gra###</a></td>
<td class="color_line" valign="center" align="center">SHELL</td>
<td class="color_line" valign="center" align="center">2,899</td>
<td class="color_line" valign="center" align="center"> - </td>
<td class="color_line" valign="center" align="center">-</td>
<td class="color_line" valign="center" align="center">-</td>
<td class="color_line" valign="center" align="center">04/09/2013</td>
</tr>

And another part of the file:

<tr>
<td class="lincol" valign="center" align="left">E.u. Ten#### Neto Combust###is</td>
<td class="lincol" valign="center" align="left">Avenida Marechal Mascarenhas de Morais, 4900 </td>
<td valign="left"><a class="linkpadrao" href="javascript:Direciona('IMBIRIBEIRA');">Imbiribeira</a></td>
<td valign="center" align="center">COSAN COMBUST##EIS</td>
<td valign="center" align="center">2,899</td>
<td valign="center" align="center">2,505</td>
<td valign="center" align="center">CIF</td>
<td valign="center" align="center">-</td>
<td valign="center" align="center">04/09/2013</td>
</tr>
<tr>

I was working with the 'linkpadrao' class, using parents to get the data. It works fine but I can't manage to get the data above 'linkpadrao' class:

posto.parent.search('~ td').map &:text

Any ideas?

Upvotes: 0

Views: 115

Answers (1)

Justin Ko
Justin Ko

Reputation: 46826

In the css-selector ~ td, the ~ is a general sibling selector. Unfortunately, the sibling selectors (general and adjacent) only select siblings after the node. This is why you cannot get the previous td elements. Css-selectors do not have a preceding sibling selector.

Since you want all of the td elements, you could go up one more parent to the tr element and then grab all td elements:

posto.parent.parent.search('td').map &:text
#=> E.u. Ten#### Neto Combust###is
#=> Avenida Marechal Mascarenhas de Morais, 4900 
#=> Imbiribeira
#=> COSAN COMBUST##EIS
#=> 2,899
#=> 2,505
#=> CIF
#=> -
#=> 04/09/2013

Note that I am assuming posto is the link node.

Alternatively, you can use xpath, which does have a preceding sibling selector. However, in this case, it does not seem as nice:

posto.parent.xpath('./following-sibling::td|preceding-sibling::td').map &:text

Upvotes: 1

Related Questions