michaelmichael
michaelmichael

Reputation: 14125

A better way to remove blank lines after Nokogiri Node removal

Perhaps this is nitpicky, but I have to ask.

I'm using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove leaves blank lines in the XML. I'm currently using a regex to get rid of the blank lines. Is there some built-in Nokogiri method I should be using?

Here's what I have:

require 'Nokogiri'
io_path = "/path/to/metadata.xml"
io = File.read(io_path)
document = Nokogiri::XML(io)
document.xpath('//artwork_files', '//tracks', '//previews').remove

# write to file and remove blank lines with a regular expression
File.open(io_path, 'w') do |x|
  x << document.to_s.gsub(/\n\s+\n/, "\n")
end

Upvotes: 9

Views: 5792

Answers (3)

digitalronin
digitalronin

Reputation: 636

This removed blank lines for me;

doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)

Upvotes: 3

Mike Ciul
Mike Ciul

Reputation: 11

Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.

One rather messy solution I found was to reparse the document:

xml = Nokogiri::XML.parse xml.to_xml

Now adjacent text nodes will be merged and you can do regexes on them.

But this looks like it's probably a better option:

https://github.com/tobym/nokogiri-pretty

Upvotes: 1

akuhn
akuhn

Reputation: 27793

There is not built-in methods, but we can add one

class Nokogiri::XML::Document
  def remove_empty_lines!
    self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
  end
end

Upvotes: 7

Related Questions