Reputation: 947
I have a list of keywords to search for. Most of them are case insensitive, but a few of them are case sensitive such as IT or I.T. for information technology. Usually, I join all the keywords together with "|", and set the flag to re.I. This will cause trouble for the case-sensitive keywords. Is there an easy way to get around this? Or I have to run a separate search for the case-sensitive ones? Thank you!
keywords = ["internal control", "IT",... and many more]
patterns = r"\b(" + "|".join(keywords) + r")\b"
m = re.findall(patterns, text, flags = re.I)
Upvotes: 1
Views: 230
Reputation: 195623
You can use (?-i:...)
modifier to turn off case-insensitive search for this group. But it works only on Python 3.6+:
import re
s = "Internal control, it IT it's, Keyword2"
keywords = ["internal control", "IT", "keyword2"]
pattern = '|'.join(r'((?-i:\b{}\b))'.format(re.escape(k)) if k.upper() == k else r'(\b{}\b)'.format(re.escape(k)) for k in keywords)
print(re.findall(pattern, s, flags=re.I))
Prints:
[('Internal control', '', ''), ('', 'IT', ''), ('', '', 'Keyword2')]
From Python 3.6 documentation:
(?imsx-imsx:...)
(Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)
Upvotes: 2
Reputation: 945
(Posting this as an answer because it is too much text for a comment)
I still think two separate searches would be cleaner and simpler. So this may be academic: you could possibly use some combination of Conditional regex and optional mode modifiers as indicated in the respective links.
Upvotes: 1