Reputation: 3341
I have a list of strings. Some of them are of the form 123-...456
. The variable portion "..." may be:
123-apple-456
123-banana-456
123-456
(note there's only one hyphen)Any word other than "apple" or "banana" is invalid.
For these three cases, I would like to match "apple", "banana", and "", respectively. Note that I never want capture the hyphen, but I always want to match it. If the string is not of the form 123-...456
as described above, then there is no match at all.
How do I write a regular expression to do this? Assume I have a flavor that allows lookahead, lookbehind, lookaround, and non-capturing groups.
The key observation here is that when you have either "apple" or "banana", you must also have the trailing hyphen, but you don't want to match it. And when you're matching the blank string, you must not have the trailing hyphen. A regex that encapsulates this assertion will be the right one, I think.
Upvotes: 334
Views: 310696
Reputation: 1121
Regular expression to test:
\d{3}-(?:(apple|banana)-|)\d{3}
matches strings with three digits, followed by either "apple-", "banana-", or nothing, and ending with three digits. It captures "apple" or "banana" if present. It also captures result as empty when there is no "apple" or "banana"
Tested on the following data set:
123-apple-456
123-banana-456
123-banana456
123banana-456
123-456
123456
123-coconut-456
123-123-456
123-apple456
Found matches:
Match 1
1. apple
Match 2
1. banana
Match 3
1.
Match 4
1.
Upvotes: 0
Reputation: 1
123-(?:(apple|banana)-)?456
The word in the middle is in capturing group 1 (.groups()[0]
). If it doesn't exist, this returns null
.
Upvotes: 0
Reputation: 5
"123-apple-456, 87568-555"
/(\d+-)(?:[a-z]*-?)*(\d+)/
\1\2
123-456, 87568-555
Upvotes: 0
Reputation: 1928
In javascript try: /123-(apple(?=-)|banana(?=-)|(?!-))-?456/
Remember that the result is in group 1
Based on the input provided by Germán Rodríguez Herrera
Upvotes: 17
Reputation: 655697
The only way not to capture something is using look-around assertions:
(?<=123-)((apple|banana)(?=-456)|(?=456))
Because even with non-capturing groups (?:…)
the whole regular expression captures their matched contents. But this regular expression matches only apple
or banana
if it’s preceded by 123-
and followed by -456
, or it matches the empty string if it’s preceded by 123-
and followed by 456
.
Lookaround | Name | What it Does |
---|---|---|
(?=foo) | Lookahead | Asserts that what immediately FOLLOWS the current position in the string is foo |
(?<=foo) | Lookbehind | Asserts that what immediately PRECEDES the current position in the string is foo |
(?!foo) | Negative Lookahead | Asserts that what immediately FOLLOWS the current position in the string is NOT foo |
(?<!foo) | Negative Lookbehind | Asserts that what immediately PRECEDES the current position in the string is NOT foo |
Upvotes: 502
Reputation: 1216
A variation of the expression by @Gumbo that makes use of \K
for resetting match positions to prevent the inclusion of number blocks in the match. Usable in PCRE regex flavours.
123-\K(?:(?:apple|banana)(?=-456)|456\K)
Matches:
Match 1 apple
Match 2 banana
Match 3
Upvotes: 0
Reputation: 71
I have modified one of the answers (by @op1ekun):
123-(apple(?=-)|banana(?=-)|(?!-))-?456
The reason is that the answer from @op1ekun also matches "123-apple456"
, without the hyphen after apple.
Upvotes: 6
Reputation: 294
By far the simplest (works for python) is '123-(apple|banana)-?456'
.
Upvotes: -5
Reputation: 4999
Try:
123-(?:(apple|banana|)-|)456
That will match apple
, banana
, or a blank string, and following it there will be a 0 or 1 hyphens. I was wrong about not having a need for a capturing group. Silly me.
Upvotes: 10