Reputation: 15872
I have some larger text which in essence looks like this:
abc12..manycharshere...hi - abc23...manyothercharshere...jk
Obviously there are two items, each starting with "abc", the numbers (12 and 23) are interesting as well as the "hi" and "jk" at the end.
I would like to create a regular expression which allows me to parse out the numbers, but only if the two characters at the end match, i.e. I am looking for the number related to "jk", but the following regular expression matches the whole string and thus returns "12", not "23" even when non-greedy matching the area with the following:
abc([0-9]+).*?jk
Is there a way to construct a regular expression which matches text like the one above, i.e. retrieving "23" for items ending in "jk"?
Basically I would need something like "match abc followed by a number, but only if there is "jk" at the end before another instance of "abc followed by a number appears"
Note: the texts/matches are an abstraction here, the actual text is more complicated, espially the things that can appear as "manyothercharactershere", I simplified to show the underlying problem more clearly.
Upvotes: 0
Views: 364
Reputation: 298123
Being non-greedy does not change the rule, that the first match is returned. So abc([0-9]+).*?jk
will find the first jk
after “abc
number” rather than the last one, but still match the first “abc
number”.
One way to solve this is to tell that the dot should not match abc([0-9]+)
:
abc([0-9]+)((?!abc([0-9]+)).)*jk
If it is not important to have the entire pattern being an exact match you can do it simpler:
.*(abc([0-9]+).*?jk)
In this case, it’s group 1 which contains your intended match. The pattern uses a greedy matchall to ensure that the last possible “abc
number” is matched within the group.
Upvotes: 1
Reputation: 424983
Assuming that hyphen separates "items", this regex will capture the numbers from the target item:
abc([0-9]+)[^-]*?jk
See demo
Upvotes: 0
Reputation: 174696
I think you want something like this,
abc([0-9]+)(?=(?:(?!jk|abc[0-9]).)*jk)
Upvotes: 2
Reputation: 784958
You need to use negative lookahead here to make it work:
abc(?!.*?abc)([0-9]+).*?jk
Here (?!.*?abc)
is negative lookahead that makes sure to match abc
where it is NOT followed by another abc
thus making sure closes string between abc
and jk
is matched.
Upvotes: 1