Reputation: 70
Ok what I want to do is strip the contents of my html file, local for now and then out put it to a file. That part works but when I do it it takes out all the spacing, for example I have an H1 tag with content and a P tag, using the code below I run it, the stripped stuff is place d in the file but its on a single line, I want to to be broken in to multiple lines.
require "rubygems"
require "nokogiri"
my_html = open("./my_html.html")
File.open("./no_html.txt", "a+") do |file|
file.puts Nokogiri::HTML(my_html).text
end
Upvotes: 1
Views: 1277
Reputation: 8027
If you want to split up the string which is returned from Nokogiri::HTML(my_html).text
, you may use String#scan
:
> "abcdefghijklmnpqrstuvwxyzfdsafadfasfadsfafdasfadfasdfasdfasdfdsf".scan(/.{5}/)
=> ["abcde", "fghij", "klmnp", "qrstu", "vwxyz", "fdsaf", "adfas", "fadsf", "afdas", "fadfa", "sdfas", "dfasd"]
If you want to beautify the HTML use
Nokogiri::HTML(my_html,&:noblanks)
as is pointed out in the SO post @Mircea pointed out in the comments.
Upvotes: 1