Zack Herbert
Zack Herbert

Reputation: 960

How can I strip HTML tags from a string in the model before I get to the view

Trying to determine how to strip the HTML tags from a string in Ruby. I need this to be done in the model before I get to the view. So using:

ActionView::Helpers::SanitizeHelperstrip_tags()

won't work. I was looking into using Nokogiri, but can't figure out how to do it.

If I have a string:

description = <a href="http://google.com">google</a>

I need it to be converted to plain text without including HTML tags so it would just come out as "google".

Right now I have the following which will take care of HTML entities:

def simple_description
  simple_description = Nokogiri::HTML.parse(self.description)
  simple_description.text
end

Upvotes: 1

Views: 2080

Answers (3)

the Tin Man
the Tin Man

Reputation: 160551

Nokogiri is a great choice if you don't own the HTML generator and you want to reduce your maintenance load:

require 'nokogiri'
description = '<a href="http://google.com">google</a>'
Nokogiri::HTML::DocumentFragment.parse(description).at('a').text
# => "google"

The good thing about a parser vs. using patterns, is the parser continues work with changes to the tags or format of the document, whereas patterns get tripped up by those things.

While using a parser is a little slower, it more than makes up for that by the ease of use and reduced maintenance.

The code above breaks down to:

Nokogiri::HTML(description).to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><a href=\"http://google.com\">google</a></body></html>\n"

Rather than let Nokogiri add the normal HTML headers, I told it to parse only that one node into a document fragment:

Nokogiri::HTML::DocumentFragment.parse(description).to_html
# => "<a href=\"http://google.com\">google</a>"

at finds the first occurrence of that node:

Nokogiri::HTML::DocumentFragment.parse(description).at('a').to_html
# => "<a href=\"http://google.com\">google</a>"

text finds the text in the node.

Upvotes: 1

Weibo Chen
Weibo Chen

Reputation: 379

Maybe you could use regular expression in ruby like following

des = '<a href="http://google.com">google</a>'
p des[/<.*>(.*)\<\/.*>/,1]

The result will be "google"

Regular expression is powerful.

You could customize to fit your needs.

Upvotes: -1

Philip Hallstrom
Philip Hallstrom

Reputation: 19879

You can call the sanitizer directly like this:

Rails::Html::FullSanitizer.new.sanitize('<b>bold</b>')
# => "bold"

There are also other sanitizer classes that may be useful: FullSanitizer, LinkSanitizer, Sanitizer, WhiteListSanitizer.

Upvotes: 2

Related Questions