centic
centic

Reputation: 15872

Regular expression non-greedy but still

I have some larger text which in essence looks like this:

abc12..manycharshere...hi - abc23...manyothercharshere...jk

Obviously there are two items, each starting with "abc", the numbers (12 and 23) are interesting as well as the "hi" and "jk" at the end.

I would like to create a regular expression which allows me to parse out the numbers, but only if the two characters at the end match, i.e. I am looking for the number related to "jk", but the following regular expression matches the whole string and thus returns "12", not "23" even when non-greedy matching the area with the following:

abc([0-9]+).*?jk

Is there a way to construct a regular expression which matches text like the one above, i.e. retrieving "23" for items ending in "jk"?

Basically I would need something like "match abc followed by a number, but only if there is "jk" at the end before another instance of "abc followed by a number appears"

Note: the texts/matches are an abstraction here, the actual text is more complicated, espially the things that can appear as "manyothercharactershere", I simplified to show the underlying problem more clearly.

Upvotes: 0

Views: 364

Answers (5)

Holger
Holger

Reputation: 298123

Being non-greedy does not change the rule, that the first match is returned. So abc([0-9]+).*?jk will find the first jk after “abcnumber” rather than the last one, but still match the first “abcnumber”.

One way to solve this is to tell that the dot should not match abc([0-9]+):

abc([0-9]+)((?!abc([0-9]+)).)*jk

If it is not important to have the entire pattern being an exact match you can do it simpler:

.*(abc([0-9]+).*?jk)

In this case, it’s group 1 which contains your intended match. The pattern uses a greedy matchall to ensure that the last possible “abcnumber” is matched within the group.

Upvotes: 1

TheLostMind
TheLostMind

Reputation: 36304

Use a regex like this. .*abc([0-9]+).*?jk

demo here

Upvotes: 3

Bohemian
Bohemian

Reputation: 424983

Assuming that hyphen separates "items", this regex will capture the numbers from the target item:

abc([0-9]+)[^-]*?jk

See demo

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

I think you want something like this,

abc([0-9]+)(?=(?:(?!jk|abc[0-9]).)*jk)

DEMO

Upvotes: 2

anubhava
anubhava

Reputation: 784958

You need to use negative lookahead here to make it work:

abc(?!.*?abc)([0-9]+).*?jk

RegEx Demo

Here (?!.*?abc) is negative lookahead that makes sure to match abc where it is NOT followed by another abc thus making sure closes string between abc and jk is matched.

Upvotes: 1

Related Questions