Stupid.Fat.Cat
Stupid.Fat.Cat

Reputation: 11285

Python sanitizing html from a string

Is there a way to escape all quotes and double quotes in a string?

For example if I have a string like this:

Hi my name is 'Shelby"

Is there a way to preprocess this to escape that string?

EDIT:

Maybe that wasn't the best approach to the problem. So here's what I'm actually trying to do, I have a tool that analyzes swf files, (namely swftools -> swfdump) But sometimes, some malicious swf files will contain html tags, and I'm outputting these results to a page. So is there a way to sanitize these html tags in python?

Sample of string:

 (    3 bytes) action: Push Lookup16:443 ("title_txt")
 (    0 bytes) action: GetMember
 (    6 bytes) action: Push Lookup16:444 ("htmlText") Lookup16:445 ("Please check your Log In info.")
 (    0 bytes) action: SetMember
 (   14 bytes) action: Push int:2 int:1 register:1 Lookup:30 ("login_mc")

For the part that says Please check your log info it's supposed to say: font color = '#ff0000'

Upvotes: 1

Views: 5913

Answers (3)

G M
G M

Reputation: 22449

I think that the current approach is to use the html module.

import html
html.escape('Hi my name is \'Shelby\"')
Out: 'Hi my name is 'Shelby"'

Upvotes: 1

exhuma
exhuma

Reputation: 21687

If you use a templating like Jinja or Genshi, they will do that for you already. All text which is embedded into the page will be properly escaped unless you explicitly tell it not to. When building web-pages, it might anyway be a good idea to use a templating-engine.

Upvotes: 0

Alyssa Haroldsen
Alyssa Haroldsen

Reputation: 3731

If you're just going for HTML sanitizing, you can try this:

This is probably the easiest approach if you want to add more escape types:

def escape(htmlstring):
    escapes = {'\"': '"',
               '\'': ''',
               '<': '&lt;',
               '>': '&gt;'}
    # This is done first to prevent escaping other escapes.
    htmlstring = htmlstring.replace('&', '&amp;')
    for seq, esc in escapes.iteritems():
        htmlstring = htmlstring.replace(seq, esc)
    return htmlstring

This replaces every instance of &, ', ", <, and > with their correct HTML escape codes.

More information on HTML escaping:

Wikipedia HTML Page

Every Escape imaginable

Happy Escaping!

Upvotes: 4

Related Questions