Aayush Jain
Aayush Jain

Reputation: 21

Parsing with embedded < and >

I've a HTML content which has HTML entities like <, >, % in the HTML tags.

html_text = '<td class="web" width="56" valign="middle" style="color:#333333; font-family:Arial, Helvetica, sans-serif; font-size:12px; line-height:18px; padding-top:38px; padding-bottom:40px;"><img alt="<%= ab("###/a/j/img1_alt_text=Hey") %>" src="<%%= @dropbox_path %>/path/to/image/image.png" width="42" height="41" border="0" hspace="0" vspace="0" style="display:block; vertical-align:top;">String1</td>'

When I make HTML doc out of this HTML content(in string) using

html_doc = Nokogiri::HTML(html_text,nil, "UTF-8")

and I try to traverse it asking to give me a text node,

html_doc.traverse do |x|

x.text?

temp = x.content

puts temp

I actually wanted here 'String1' as the output, but it gives me:

" src="/path/to/image/image.png" width="42" height="41" border="0" hspace="0" vspace="0" style="display:block; vertical-align:top;">

Upvotes: 2

Views: 104

Answers (1)

Gaurav Dave
Gaurav Dave

Reputation: 7474

Try:

page.css('td')[0].text

Refer to "Parsing HTML with Nokogiri" for more information.

Upvotes: 1

Related Questions