Yin Zhu
Yin Zhu

Reputation: 17119

How to convert this regular expression into Python

I want to use this regular expression in Python:

 <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>

(from RegEx match open tags except XHTML self-contained tags)

def removeHtmlTags(page):
    p = re.compile(r'XXXX')
    return p.sub('', page)

It seems that I cannot directly substitute the complex regular expression into the above function.

Upvotes: 0

Views: 4237

Answers (2)

mcrisc
mcrisc

Reputation: 819

If you need to remove HTML tags, this should do it:

import re

def removeHtmlTags(page):
    pattern = re.compile(r'\<[^>]+\>', re.I)
    return pattern.sub('', page)

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798746

Works fine here. You're probably having trouble because of the quotes. Just triple-quote it:

def removeHtmlTags(page):
    p = re.compile(r'''<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>''')
    return p.sub('', page)

Upvotes: 3

Related Questions