Reputation: 495
I'm having trouble wrapping my head around how to find the 1st instance of something, then work "backwards" using Regex...
I have some strings where a product code is combined with a product name. Unfortunately, the delimiter (a dash) separating the product code from the product code is the same.
The product code can have different numbers of delimiters. Some product codes have one dash, while others might have multiple dashes.
But, I know that all product names have a space.
So taking these two strings, for example:
I'd like to do the equivalent of:
So what I want to extract from the above 2 examples: - "ABC-ER-015-30" - "ABC-1234"
This works if there are no dashes in the Item Name:
(.*)-
But if there is a dash in the Item Name, it captures part of the Item Name.
I feel like there's something really simple that I'm missing.
Upvotes: 0
Views: 674
Reputation: 163362
You could use match 1+ uppercase chars and repeat matching a dash and 1+ uppercase chars.
As you know that all product names have a space
, you could add a positive lookahead asserting a dash, 1+ non whitespace chars followed by a space.
^[A-Z0-9]+(?:-[A-Z0-9]+)+(?=-\S+ )
^
Start of string[A-Z0-9]+
Match 1+ times A-Z0-9(?:-[A-Z0-9]+)+
Repeat 1+ times matching -
and A-Z0-9(?=-\S+ )
Positive lookahead, assert -
, 1+ non whitspace chars and a spaceAnother option is to make use of a capturing group instead of a positive lookahead
^([A-Z0-9]+(?:-[A-Z0-9]+)+)-\S+
Upvotes: 2
Reputation: 19641
You may use the following pattern:
^(?:[A-Z0-9]+-?)+?(?=-\S+[ ])
Demo.
Breakdown:
^ # Beginning of the string.
(?: # Start of a non-capturing group.
[A-Z0-9]+ # Any uppercase letter or a digit repeated one or more times.
-? # An optional hyphen characters.
) # End of the non-capturing group.
(?= # Start of a positive Lookahead.
- # Matches a hyphen character literally.
\S+ # Any non-whitespace character repeated one or more times.
[ ] # Matches a space character.
) # End of the lookahead.
Upvotes: 1