modulaaron
modulaaron

Reputation: 2906

Need to remove newlines from object/embed tags only using Nokogiri

I need to remove newlines from any object/embed tags. I am currently attempting to do so using Nokogiri by doing the following:

s = "<div>
<object height='450' width='600'>
<param name='allowfullscreen' value='true'>
<param name='allowscriptaccess' value='always'>
<param name='movie' value='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1'>
<embed src='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1' type='application/x-shockwave-flash' allowfullscreen='true' allowscriptaccess='always' height='450' width='600'>
</embed>
</object>
</div>"
doc = Nokogiri::HTML(s)
doc.css('object').each { |o| o.inner_html.gsub!(/\n/, ""); puts o.inner_html }

Please note that the example is for object tags only.

Printing o.inner_html at the end of the block shows that no replacement has occurred, even though the gsub text appears correct. Also, once that part is resolved, I need to make sure that the actual object node in the doc object is saved with the updated values.

Any help is most appreciated. Thanks.

Upvotes: 3

Views: 4230

Answers (1)

Phrogz
Phrogz

Reputation: 303215

Got it!

require 'nokogiri'
s = <<ENDHTML
<div>
<object height='450' width='600'>
  <param name='allowfullscreen' value='true'><param name='allowscriptaccess' value='always'>
  <param name='movie' value='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1'>
<embed src='http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1' type='application/x-shockwave-flash' allowfullscreen='true' allowscriptaccess='always' height='450' width='600'>
</embed>
</object>
</div>
ENDHTML

doc = Nokogiri::HTML(s)
doc.css('object,embed').each{ |e| e.inner_html = e.inner_html.gsub(/\n/,'') }
puts doc.serialize( save_with: 0 )

#=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
#=> <html><body><div>
#=> <object height="450" width="600"><param name="allowfullscreen" value="true"><param name="allowscriptaccess" value="always"><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1"><embed src="http://vimeo.com/moogaloop.swf?clip_id=3317924&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" height="450" width="600"></embed></object>
#=> </div></body></html>
  1. Removing all text nodes does not fully clean the document; you need to use the inner_html.
  2. Calling inner_html.gsub! is not the same as inner_html = inner_html.gsub.
  3. As shown, you need to use serialize with the hash :save_with => 0 passed in to prevent Nokogiri from generating newlines between tags in the output.

Upvotes: 8

Related Questions