Parsing HTML document

Question

I am trying to parse the following HTML using Ruby and Nokogiri:



 


June 30, 2015



Band Concert

Event






Have a question? email us.







111 Main Street

Mainstreet, Ohio 55111
map link


Telephone: 3305551000


Visit our website for complete information.


Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.

Look for more details and ticket sales to be released soon on our website

I am trying to grab the last bit of text:

Visit our website for complete information.


Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.

Look for more details and ticket sales to be released soon on our website

Here is my code thus far:

events = doc.css("div.vevent")
events.collect do |row|
  row.css("td")[3]  
end

This will get me to the third td which has the text that I am looking for as follows:



111 Main Street

Mainstreet, Ohio 55111
map link


Telephone: 3305551000


Visit our website for complete information.


Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.

Look for more details and ticket sales to be released soon on our website

However once there if I call text on that td I get all the text inside of the td. I only want the last bit that is not inside any element. I tried using XPath and parent so that I could say "just give me the text that is inside the td (not nested inside of another element)" but I couldn't get that to work. Anyone have any ideas on this?

igor_rb · Accepted Answer

Try this code: doc.css('td')[3].css('> text()').to_s.strip

Parsing HTML document

Answers (2)

Related Questions