Reputation: 895
I am using mechanize/nokogiri and need to parse out a HTML with a lot of these tables:
<table width="100%" onclick="javascript:abredown('c7a8e8041a5031f127d5d27f3f071cbb');" class="buscaDestaque" bgcolor="#F7D36A">
<tr>
<td rowspan="2" scope="col" style="width:5%"><img src="images/gold.gif" border="0"></td>
<td scope="col" style="width:45%" class="mais"><b>Community - 2nd Season</b><br />Community - 2ª Temporada<br/><b>Downloads: </b> 2496 <b>Comentários: </b>17<br><b>Avaliação: </b> 10/10</td>
<td scope="col" style="width:20%">28/03/2011 - 21:07</td>
<td scope="col" style="width:20%"><a href="javascript:abreinfousuario(1083150)">SubsOTF</a></td>
<td scope="col" style="width:10%"><img src='images/flag_br.gif' border='0'></td>
</tr>
<tr>
<td colspan="4">Release: <span class="brls">Community.S02E19.HDTV.XviD-LOL/DIMENSION</span></td>
</tr>
</table>
I want this output
Community.S02E19.HDTV.XviD-LOL/DIMENSION, ('c7a8e8041a5031f127d5d27f3f071cbb')
Can anyone help me?
Upvotes: 1
Views: 1068
Reputation: 303520
require 'nokogiri'
html = Nokogiri::HTML html_with_many_tables
results = html.css('table.buscaDestaque').map do |table|
jsid = table['onclick'][/'(\w+)'/,1]
brls = table.at_css('.brls').text
"#{brls}, #{jsid}"
end
p results
#=>["Community.S02E19.HDTV.XviD-LOL/DIMENSION, c7a8e8041a5031f127d5d27f3f071cbb",
#=> "AnotherBRLS, anotherJSID"]
Upvotes: 6