Reputation: 292
I am trying to parse the following HTML using Ruby and Nokogiri:
<div class="vevent">
<table width="750"><tr>
<td width="25"> </td>
<td valign="top" width="200">
<font size="2" face="sans-serif">
<font color="black"><b>June 30, 2015</b></font>
<br>
<span class="dtstart"><span class="value-title" title="2015-06-30"></span></span><br><span class="summary"><font color="#92161" size="3"><b>Band Concert</b></font></span>
<br><font color="#333333">Event</font><br>
<br>
<br>
<br clear="left">Have a question? email us.<br>
<br></font>
</td>
<td valign="top" width="10"></td>
<td valign="top">
<br clear="left"><font color="#92161">111 Main Street</font><br>
<font color="#92161">Mainstreet, Ohio 55111</font>
<a rel="nofollow" href="http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=%221700+111+MainStreet+NE+Mainstreet,+Ohio+55111%22" target="_blank"><font size="1" face="sans-serif">map link</font></a><br><br>
<font color="#92161"><font size="2" face="sans-serif">Telephone:</font> 3305551000</font><br><br>
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
<br>
</td>
</tr></table>
</div>
I am trying to grab the last bit of text:
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
Here is my code thus far:
events = doc.css("div.vevent")
events.collect do |row|
row.css("td")[3]
end
This will get me to the third td which has the text that I am looking for as follows:
<td valign="top">
<br clear="left"><font color="#92161">111 Main Street</font><br>
<font color="#92161">Mainstreet, Ohio 55111</font>
<a rel="nofollow" href="http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=%221700+111+MainStreet+NE+Mainstreet,+Ohio+55111%22" target="_blank"><font size="1" face="sans-serif">map link</font></a><br><br>
<font color="#92161"><font size="2" face="sans-serif">Telephone:</font> 3305551000</font><br><br>
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
<br>
</td>
However once there if I call text
on that td I get all the text inside of the td. I only want the last bit that is not inside any element. I tried using XPath and parent
so that I could say "just give me the text that is inside the td (not nested inside of another element)" but I couldn't get that to work. Anyone have any ideas on this?
Upvotes: 1
Views: 80
Reputation: 5998
I suggest using xpath
which is more flexible.
If I understand you correctly, you would like:
I only want the last bit that is not inside any element
So, try this XPath:
//table//td[last()]/text()
Upvotes: 0