Reputation: 2605
I have a small regex to handle. I have 2 different terms.
I want to do the following two regex substitution in a single regex substitute statement.
clntxt = re.sub('(?i)United States', 'USA', "united states")
# Output: USA
clntxt = re.sub('US', 'USA', "US and us")
# output: USA and us
I need something like
clntxt = re.sub('(?i)United States|(?s)US', 'USA', "united states and US and us")
# output: USA and USA and us
How can I achieve the above?
Upvotes: 5
Views: 4816
Reputation: 786
In legacy Python versions, (?i)
turns on "ignore case" flag for the entire expression. From official doc:
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. Flags should be used first in the expression string.
Since Python 3.6, however, you could toggle the flags within a part of the expression:
(?imsx-imsx:...)
(Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)
New in version 3.6.
For example, (?i:foo)bar
matches foobar
and FOObar
but not fooBAR
. So to answer your question:
>>> re.sub('(?i:United States)|US', 'USA', 'united states and US and us')
'USA and USA and us'
Note this only works in Python 3.6+.
Upvotes: 9