Beautiful Soup / regex matching over multiple lines

Question

I have basically an RSS indexing app written in Python that stores the RSS content as a blurb in the DB. When the app initially processed the article contents, it commented out all links that didn't match certain criteria, for example:

Google

Became:

 Google

Now I need to process all these old articles and modify the links. So using BeautifulSoup 4 I can easily find the comments using:

links = soup.findAll(text=lambda text:isinstance(text, Comment))
for link in links:
    text = re.sub('<[^>]*>', '', link.string)
    # any html in the link tag was escaped by BS4, so need to convert back
    text = text.replace('&lt;','<')
    text = text.replace('&gt;','>')
    find = link.string + " " + text

The ouput of "find" above is:

 Google

Which makes it easier to perform a .replace() on the content.

Now the problem I'm having (and I'm sure this is simple) is multi-line find/replacing. When Beautiful Soup initial commented out the links, some were converted to:

 Google

or

 
Google

So obviously, replace(old,new) won't work since replace() doesn't cover multi-lines.

Can someone help me out with a regex multi-line find/replace? It should be case-sensitive.

Beautiful Soup / regex matching over multiple lines

Answers (1)

Related Questions