How to get Nokogiri inner_HTML object to ignore/remove escape sequences

Question

Currently, I am trying to get the inner HTML of an element on a page using nokogiri. However I'm not just getting the text of the element, I'm also getting its escape sequences. Is there a way i can suppress or remove them with nokogiri?

require 'nokogiri'
require 'open-uri'

page = Nokogiri::HTML(open("http://the.page.url.com"))

page.at_css("td[custom-attribute='foo']").parent.css('td').css('a').inner_html

this returns => " TheActuallyInnerContentThatIWant "

What is the most effective and direct nokogiri (or ruby) way of doing this?

Aleksei Matiushkin · Accepted Answer

page.at_css("td[custom-attribute='foo']")
    .parent
    .css('td')
    .css('a')
    .text               # since you need a text, not inner_html
    .strip              # this will strip a result

String#strip.

Sidenote: css('td a') is likely more efficient than css('td').css('a').

How to get Nokogiri inner_HTML object to ignore/remove escape sequences

Answers (2)

Related Questions