quantity
quantity

Reputation: 4181

Is there any method to remove javascript code from an HTML document?

I want to remove all the javascript code from an HTML document, and leave the actual text. Is there any regex or python script to do this? Thanks.

Upvotes: 4

Views: 3159

Answers (3)

Nishant
Nishant

Reputation: 3005

You can write a regex looking for '<script' and 'script>' and very well do it.

Edit: As @cHao points out - Regex's are bad for parsing HTML.

Regex might still be useful, at places where you have full control over HTML.

Upvotes: 1

Mehrdad
Mehrdad

Reputation: 2116

You can use this jQuery code to remove:

$(javascript).html('')

and Firebug to inject your jQuery code into the webpage:

>>> var x = window.open(""); 
Window opened 
>>> x 
Window about:blank 
>>> x.document 
Document about:blank 
>>> x.document.write("$(javascript).html('')"); 
Alert popped up

Upvotes: 0

icktoofay
icktoofay

Reputation: 129011

Using BeautifulSoup:

#!/usr/bin/env python
from BeautifulSoup import BeautifulSoup

with open("with-scripts.html", "r") as f:
    soup = BeautifulSoup(f.read())

for script in soup("script"):
    script.extract()

with open("without-scripts.html", "w") as f:
    f.write(soup.prettify())

Upvotes: 5

Related Questions