Sidd
Sidd

Reputation: 1208

python replace multiple words retaining case

I'm trying to write a filter in django that highlights words based on a search query. For example, if my string contains this is a sample string that I want to highlight using my filter and my search stubs are sam and ring, my desired output would be:

this is a <mark>sam</mark>ple st<mark>ring</mark> that I want to highlight using my filter

I'm using the answer from here to replace multiple words. I've presented the code below:

import re

words = search_stubs.split()
rep = dict((re.escape(k), '<mark>%s</mark>'%(k)) for k in words)
pattern = re.compile('|'.join(rep.keys()))
text = pattern.sub(lambda m : rep[re.escape(m.group(0))], text_to_replace)

However, when there is case sensitivity, this breaks. For example, if I have the string Check highlight function, and my search stub contains check, this breaks.

The desired output in this case would naturally be:

<mark>Check</mark> highlight function

Upvotes: 0

Views: 770

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

You don't need to go for dictionary here. (?i) called case-insensitive modifier helps to do a case-insensitive match.

>>> s = "this is a sample string that I want to highlight using my filter"
>>> l = ['sam', 'ring']
>>> re.sub('(?i)(' + '|'.join(map(re.escape, l)) + ')', r'<mark>\1</mark>', s)
'this is a <mark>sam</mark>ple st<mark>ring</mark> that I want to highlight using my filter'

EXample 2:

>>> s = 'Check highlight function'
>>> l = ['check']
>>> re.sub('(?i)(' + '|'.join(map(re.escape, l)) + ')', r'<mark>\1</mark>', s)
'<mark>Check</mark> highlight function'

Upvotes: 1

abarnert
abarnert

Reputation: 365707

The simple way to do this is to not try to build a dict mapping every single word to its marked-up equivalent, and just use a capturing group and a reference to it. Then you can just use the IGNORECASE flag to do a case-insensitive search.

pattern = re.compile('({})'.format('|'.join(map(re.escape, words))),
                     re.IGNORECASE)
text = pattern.sub(r'<mark>\1</mark>', text_to_replace)

For example, if text_to_replace were:

I am Sam. Sam I am. I will not eat green eggs and spam.

… then text will be:

I am <mark>Sam</mark>. <mark>Sam</mark> I am. I will not eat green eggs and spam

If you really did want to do it your way, you could. For example:

text = pattern.sub(lambda m: rep[re.escape(m.group(0))].replace(m, m.group(0)),
                   text_to_replace)

But that would be kind of silly. You're building a dict with 'sam' embedded in the value, just so you can replace that 'sam' with the 'Sam' that you actually matched.


See Grouping in the Regular Expression HOWTO for more on groups and references, and the re.sub docs for specifics on using references in substitutions.

Upvotes: 1

Related Questions