William
William

Reputation: 344

What is the difference between .* and .*? in a regular expression?

I am trying to learn about regular exprssoins. While investigating the difference between re.match and re.search I saw a (disputed) claim that re.match('(.*?)word(.*?)',string) was faster than re.search("word",string) I do not see the difference between .*? and .* nor do I see a need for the trailing (.*?) .

Upvotes: 0

Views: 457

Answers (2)

Pychopath
Pychopath

Reputation: 1580

See the documentation. That ? makes * non-greedy, i.e., it'll try to match as few repetitions as possible instead of as many as possible.

In your example re.match('(.*?)word(.*?)',string), that means as few leading . as possible, so try to find the earliest word instead of the last. The trailing (.*?) is indeed pointless.

Upvotes: 2

Random Davis
Random Davis

Reputation: 6857

To understand any regex, the first place you go should always be https://regex101.com/. In this case, here's what it says is the only difference between the two:

* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)

And from there, you can then enter in example text in order to test out the expression in realtime and see what the practical difference is.

Upvotes: 0

Related Questions