AnApprentice
AnApprentice

Reputation: 110950

Rails Gem sanitize - How to whitelist &

Right now we're using the sanitize gem: https://github.com/rgrove/sanitize

Problem is if you enter "hello & world" sanitize is saving that in the DB as:

hello & world 

How can you whitelist the & . We want sanitize to remove all possible malicious html and JS/script tags. but we're ok allowing the ampersand.

Ideas? Thanks

Upvotes: 14

Views: 6678

Answers (5)

SRack
SRack

Reputation: 12203

None of the other answers worked for me. The best approach I've found for my use case was using the built in Loofah gem:

good = '&'
bad = "<script>alert('I am evil');</script>"
greater_than = '>' # << my use case

Loofah.fragment(good).text(encode_special_chars: false)
# => "&"
Loofah.fragment(greater_than).text(encode_special_chars: false)
# => ">"

Loofah.fragment(bad).text(encode_special_chars: false)
# => "alert('I am evil');"

# And just for clarity, without the option passed in:
Loofah.fragment(good).text
# => "&amp;"

It's not flawless though, so be incredibly careful:

really_bad = "&lt;script&gt;alert('I am evil');&lt;/script&gt;"
Loofah.fragment(really_bad).text(encode_special_chars: false)
# => "<script>alert('I am evil');</script>"

More info on the specified method here.

Definitely the most efficient approach for what I needed to do!

Upvotes: 2

Unixmonkey
Unixmonkey

Reputation: 18784

Sanitize will always transform what is output into html entities for valid html/xhtml.

The best way I can determine is filter the output

Sanitize.fragment("hello & world").gsub('&amp;','&') #=> "Hello & world"

Upvotes: 6

Armando
Armando

Reputation: 31

As of Rails 4.2, #strip_tags does not unencode HTML special chars

strip_tags("fun & co")
  => "fun &amp; co"

Otherwise you'd get the following:

strip_tags("&lt;script&gt;")
  => "<script>"

If you only want the ampersand I'd suggest filtering the output like @Unixmonkey suggested and keep it to & only

strip_tags("<bold>Hello & World</bold>").gsub(/&amp;/, "&")
  => "Hello & World"

Upvotes: 1

Ashley Raiteri
Ashley Raiteri

Reputation: 710

UnixMonkey's answer is what we ended up doing.

def remove_markup(html_str)
    marked_up = Sanitize.clean html_str

    ESCAPE_SEQUENCES.each do |esc_seq, ascii_seq|
      marked_up = marked_up.gsub('&' + esc_seq + ';', ascii_seq.chr)
    end
    marked_up
  end

Where ESCAPE_SEQUENCES was an array of the characters we didn't want escaped.

Upvotes: 2

Related Questions