Reputation: 1208
I'm trying to write a filter in django that highlights words based on a search query. For example, if my string contains this is a sample string that I want to highlight using my filter
and my search stubs are sam
and ring
, my desired output would be:
this is a <mark>sam</mark>ple st<mark>ring</mark> that I want to highlight using my filter
I'm using the answer from here to replace multiple words. I've presented the code below:
import re
words = search_stubs.split()
rep = dict((re.escape(k), '<mark>%s</mark>'%(k)) for k in words)
pattern = re.compile('|'.join(rep.keys()))
text = pattern.sub(lambda m : rep[re.escape(m.group(0))], text_to_replace)
However, when there is case sensitivity, this breaks. For example, if I have the string Check highlight function
, and my search stub contains check
, this breaks.
The desired output in this case would naturally be:
<mark>Check</mark> highlight function
Upvotes: 0
Views: 770
Reputation: 174706
You don't need to go for dictionary here. (?i)
called case-insensitive modifier helps to do a case-insensitive match.
>>> s = "this is a sample string that I want to highlight using my filter"
>>> l = ['sam', 'ring']
>>> re.sub('(?i)(' + '|'.join(map(re.escape, l)) + ')', r'<mark>\1</mark>', s)
'this is a <mark>sam</mark>ple st<mark>ring</mark> that I want to highlight using my filter'
EXample 2:
>>> s = 'Check highlight function'
>>> l = ['check']
>>> re.sub('(?i)(' + '|'.join(map(re.escape, l)) + ')', r'<mark>\1</mark>', s)
'<mark>Check</mark> highlight function'
Upvotes: 1
Reputation: 365707
The simple way to do this is to not try to build a dict mapping every single word to its marked-up equivalent, and just use a capturing group and a reference to it. Then you can just use the IGNORECASE
flag to do a case-insensitive search.
pattern = re.compile('({})'.format('|'.join(map(re.escape, words))),
re.IGNORECASE)
text = pattern.sub(r'<mark>\1</mark>', text_to_replace)
For example, if text_to_replace
were:
I am Sam. Sam I am. I will not eat green eggs and spam.
… then text
will be:
I am <mark>Sam</mark>. <mark>Sam</mark> I am. I will not eat green eggs and spam
If you really did want to do it your way, you could. For example:
text = pattern.sub(lambda m: rep[re.escape(m.group(0))].replace(m, m.group(0)),
text_to_replace)
But that would be kind of silly. You're building a dict with 'sam'
embedded in the value, just so you can replace that 'sam'
with the 'Sam
' that you actually matched.
See Grouping in the Regular Expression HOWTO for more on groups and references, and the re.sub
docs for specifics on using references in substitutions.
Upvotes: 1