Sanitize JS scripts inside script tag of html file on ruby

Question

I try extracting contents from html file using Ruby (not RoR)

I was doing this:

require 'sanitize'
require 'nokogiri'

doc = doc = Nokogiri::HTML(html_document)
a = Sanitize.fragment(doc.css('body'))

This extract contents inside the tag, and remove all html tags. But, unfortunately, JS scripts still remain which existed inside

guitarman · Accepted Answer

I assume your are using the newest version of Sanitize.

html = "... some content ...
"

Sanitize.fragment(html, :remove_contents => ['script'])
# => ".red{color:red;} ... some content ... "

Sanitize.fragment(html, :remove_contents => ['script', 'style'])
# => " ... some content ... "

Please see: :remove_contents

Sanitize JS scripts inside script tag of html file on ruby

Answers (1)

Related Questions