How do I trim the head and tail of empty tags in HTML?

Question

I need to trim empty spaces above and after the last tag with text/content. I want to control the content displayed to the client and not "break" the visual.

 
    ~> remove
 
    ~> remove
 Text 

 
    ~> should preserve only this of the empty tags
 Text 
 Text 
 
    ~> remove
 
    ~> remove
 
    ~> remove

I'm using Sanitize and it has the ability of being passed a transfomer. The documentation shows an example snippet to remove all empty elements.

To remove empty elements before any regular element, I thought I could assign a variable to control when it stops removing the empty tags:

should_remove_empty = true
lambda {|env|
  node = env[:node]
  return unless node.elem?

  unless node.children.any?{|c| c.text? && c.content.strip.length > 0 || !c.text? }
    node.unlink if should_remove_empty
  else
    should_remove_empty = false
  end
}

But now, to remove the tail empty elements, I should iterate it upside down. But Sanitize doesn't give me this ability.

Does anyone know how to do this, or has anyone already implemented it?

7stud · Accepted Answer

I'm using https://github.com/rgrove/sanitize

From the README:

Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable elements and attributes, Sanitize will remove all unacceptable HTML from a string.

That won't work for you because sometimes you want to keep the elements that are unacceptable.

require 'nokogiri'

doc = Nokogiri::HTML(<
 
 
 
  
 Text 
 
  
 Text 
 Text 
 
   
 
  
 
 

END_OF_HTML

ps = doc.xpath '/html/body/p'

first_text = -1
last_text = 0

ps.each_with_index do |p, i|
  if not p.at_xpath('child::text()').text.strip.empty?  #then found some text
    first_text = i if first_text == -1
    last_text = i 
  end
end

puts ps.slice(first_text .. last_text)

--output:--
 Text 
 

 Text 
 Text

How do I trim the head and tail of empty tags in HTML?

Answers (1)

Related Questions