mr_bulrathi
mr_bulrathi

Reputation: 554

Make changes in text, except parts in angle brackets

Assuming having the following text,

dogs are very nice <a href="http://dogs.com">read about nice dogs here</a>

I need to change everything that is not in angle brackets, so the text will be

cats are very nice <a href="http://dogs.com">read about nice cats here</a>

I've found that regex \([^)]*\) can come in handy here, but it looks that it does not working:

s = 'dogs are very nice <a href="http://dogs.com">read about nice dogs here</a>'
s = re.sub(r'\([^)]*\)', 'cats', s)
print(s)
'dogs are very nice <a href="http://dogs.com">read about nice dogs here</a>'

I'm sorry if this question looks lame, but I'm really new to regex. Thanks for your help.

Upvotes: 0

Views: 30

Answers (1)

Christoph Burschka
Christoph Burschka

Reputation: 4689

This regex pattern doesn't seem to have anything to do with what you want - there isn't even a mention of "dog" in there, let alone angle brackets. What it does, specifically, is match any text inside round parentheses (eg. (abc)).

More generally, I don't think you'll be able to use regular expressions here.

If the HTML doesn't contain any other angle brackets (quite an assumption), you might be successful with (<[^<>]*>[^<>]*)*dogs, which should match "dogs" only if each "<" preceding it is eventually followed by a ">".

But seriously, just install something like Beautiful Soup and parse the HTML; it's easy and a lot more robust.

Upvotes: 1

Related Questions