gointern
gointern

Reputation: 33

Regex is too greedy. Cannot find a way to fix it

Is there a way to fix the following regex? I have included an example in regex101. Basically it captures too much and a wrong part between ()[] tags. It kind of does what it's supposed to but in turn I lose text and another tag.

https://regex101.com/r/OPRCuh/1

regex:

\[(.+?)\]\((https.+?)\)

sample text

_“[Developer Interview](/blog/tags/developer_interview.html)” is a new series here at Semaphore blog. We’ll interview developers from some of the companies using [text text text](https://textapp.com) to find out how they work and share their insights with you.

Upvotes: 1

Views: 113

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

The . pattern matches any char other than a line break char. So, it can match [, ], ( and ), too, until it finds a valid match. Since the regex parses the string from left to right, the regex engine finds the first [ and then finds ] after Interview, then finds ( before /blog but gives it up since it is not followed with https, but still goes on to match chars until it finds (https and thus returns a valid match.

You may use

r'\[([^][]*)]\((https[^()]*)\)'

See the regex demo

The [^][]* pattern matches 0+ chars other than [ and ] and [^()]* matches 0+ chars other than ( and ).

Upvotes: 1

Related Questions