Victor Wang
Victor Wang

Reputation: 947

How to handle mixed cases regex?

I have a list of keywords to search for. Most of them are case insensitive, but a few of them are case sensitive such as IT or I.T. for information technology. Usually, I join all the keywords together with "|", and set the flag to re.I. This will cause trouble for the case-sensitive keywords. Is there an easy way to get around this? Or I have to run a separate search for the case-sensitive ones? Thank you!

 keywords = ["internal control", "IT",... and many more]
 patterns = r"\b(" + "|".join(keywords) + r")\b"
 m = re.findall(patterns, text, flags = re.I)

Upvotes: 1

Views: 230

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195623

You can use (?-i:...) modifier to turn off case-insensitive search for this group. But it works only on Python 3.6+:

import re

s = "Internal control, it IT it's, Keyword2"
keywords = ["internal control", "IT", "keyword2"]
pattern = '|'.join(r'((?-i:\b{}\b))'.format(re.escape(k)) if k.upper() == k else r'(\b{}\b)'.format(re.escape(k)) for k in keywords)
print(re.findall(pattern, s, flags=re.I))

Prints:

[('Internal control', '', ''), ('', 'IT', ''), ('', '', 'Keyword2')]

From Python 3.6 documentation:

(?imsx-imsx:...)

(Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

Upvotes: 2

SanV
SanV

Reputation: 945

(Posting this as an answer because it is too much text for a comment)
I still think two separate searches would be cleaner and simpler. So this may be academic: you could possibly use some combination of Conditional regex and optional mode modifiers as indicated in the respective links.

Upvotes: 1

Related Questions