Reputation: 960
Trying to determine how to strip the HTML tags from a string in Ruby. I need this to be done in the model before I get to the view. So using:
ActionView::Helpers::SanitizeHelperstrip_tags()
won't work. I was looking into using Nokogiri, but can't figure out how to do it.
If I have a string:
description = <a href="http://google.com">google</a>
I need it to be converted to plain text without including HTML tags so it would just come out as "google".
Right now I have the following which will take care of HTML entities:
def simple_description
simple_description = Nokogiri::HTML.parse(self.description)
simple_description.text
end
Upvotes: 1
Views: 2080
Reputation: 160551
Nokogiri is a great choice if you don't own the HTML generator and you want to reduce your maintenance load:
require 'nokogiri'
description = '<a href="http://google.com">google</a>'
Nokogiri::HTML::DocumentFragment.parse(description).at('a').text
# => "google"
The good thing about a parser vs. using patterns, is the parser continues work with changes to the tags or format of the document, whereas patterns get tripped up by those things.
While using a parser is a little slower, it more than makes up for that by the ease of use and reduced maintenance.
The code above breaks down to:
Nokogiri::HTML(description).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><a href=\"http://google.com\">google</a></body></html>\n"
Rather than let Nokogiri add the normal HTML headers, I told it to parse only that one node into a document fragment:
Nokogiri::HTML::DocumentFragment.parse(description).to_html
# => "<a href=\"http://google.com\">google</a>"
at
finds the first occurrence of that node:
Nokogiri::HTML::DocumentFragment.parse(description).at('a').to_html
# => "<a href=\"http://google.com\">google</a>"
text
finds the text in the node.
Upvotes: 1
Reputation: 379
Maybe you could use regular expression in ruby like following
des = '<a href="http://google.com">google</a>'
p des[/<.*>(.*)\<\/.*>/,1]
The result will be "google"
Regular expression is powerful.
You could customize to fit your needs.
Upvotes: -1
Reputation: 19879
You can call the sanitizer directly like this:
Rails::Html::FullSanitizer.new.sanitize('<b>bold</b>')
# => "bold"
There are also other sanitizer classes that may be useful: FullSanitizer, LinkSanitizer, Sanitizer, WhiteListSanitizer.
Upvotes: 2