Casey Clayton
Casey Clayton

Reputation: 70

Strip html with nokogiri keeping spacing

Ok what I want to do is strip the contents of my html file, local for now and then out put it to a file. That part works but when I do it it takes out all the spacing, for example I have an H1 tag with content and a P tag, using the code below I run it, the stripped stuff is place d in the file but its on a single line, I want to to be broken in to multiple lines.

require "rubygems"
require "nokogiri"

my_html = open("./my_html.html")
File.open("./no_html.txt", "a+") do |file| 
 file.puts Nokogiri::HTML(my_html).text
end

Upvotes: 1

Views: 1277

Answers (1)

Kenny Meyer
Kenny Meyer

Reputation: 8027

If you want to split up the string which is returned from Nokogiri::HTML(my_html).text, you may use String#scan:

> "abcdefghijklmnpqrstuvwxyzfdsafadfasfadsfafdasfadfasdfasdfasdfdsf".scan(/.{5}/)
 => ["abcde", "fghij", "klmnp", "qrstu", "vwxyz", "fdsaf", "adfas", "fadsf", "afdas", "fadfa", "sdfas", "dfasd"]

If you want to beautify the HTML use

 Nokogiri::HTML(my_html,&:noblanks)

as is pointed out in the SO post @Mircea pointed out in the comments.

Upvotes: 1

Related Questions