Manish Jindal
Manish Jindal

Reputation: 129

How is ? used in regular expression in python?

I have this snippet

print(re.sub(r'(-script\.pyw|\.exe)?', '','.exe1.exe.exe'))

The output is 1 If i remove ? from the above snippet and run it as

print(re.sub(r'(-script\.pyw|\.exe)', '','.exe1.exe.exe'))

Th output is again same. Although I am using ?, it is getting greedy and replacing all '.exe' with NULL. Is there any workaround to replace only first occurrence?

Upvotes: 1

Views: 48

Answers (3)

Julio
Julio

Reputation: 5308

? is greedy. So if it can match, It will.

For example: aaab? will match aaab instead of aaa

In order to make ? non greedy, you must add an extra ? (this is the same way you make * and + non greedy, by the way)

So aaab?? will just match aaa. Yet, at the same time, aaab??c will match aaabc

Upvotes: 0

user844541
user844541

Reputation: 2958

Question mark is making the preceding token in the regular expression optional Use

print(re.sub(r'(-script\.pyw|\.exe)', '','.exe1.exe.exe', 1))

if you want to remove only the first match.

Upvotes: 0

Amadan
Amadan

Reputation: 198324

re.sub(pattern, repl, string, count=0, flags=0)

This is the signature for the re.sub function. Notice the count parameter. If you just want the first occurence to be replaced, use count=1.

? is a non-greedy modifier for repetition operators; when it stands next to anything else, it makes the previous element optional. Thus, Your top expression is replacing either -script.pyw or .exe or nothing with nothing. Since replacement of nothing by nothing doesn't change the string, the top and the bottom version (where empty string cannot be matched) will give the same result.

Upvotes: 1

Related Questions