Reputation: 75
So I've been trying to come with a regex that separates these kinds of strings: A100
, A-100
, A1-100
, A1_100
, A1A100
, "A-100"
, and many other examples.
The strings exclusively "end" with only numbers, and I say "end" because they can be in quotations, and technically it's not the end of the string, it's a word boundary though.
What I need is to get both things, whatever is behind only numbers and the string containing only numbers, I need to be able to separate them because I might need to do some additions to the only numbers part.
What I've tried is:
At the very start it was easy, A100
was easily separated with something like ([a-zA-Z]+)(\d+)
, but then I needed to separate A_100
, and I need one string that has the A_
and the other the 100
, or if it's A1-100
, I would need A1-
and then the number part 100
.
With many iterations of this problem I ended up with this messy regex:
([a-zA-Z\+\.\?\!_\-\\\d]+[a-zA-Z\+\.\?\!_\-\\]+)(\d+)
It separates a lot of the stuff I need EXCEPT for the more simple A100, because if the first part of the string has a number in it (like A1A100
) then it needs to have something else but a digit, or else I would just get A1
and A100
. But this is very very messy, and I would rather do something simple like ([^\n])(\d+)
(this obviously doesn't work) and get any string that can contain any character but newlines and then get the string that ends exclusively with numbers.
Tried to implement lookaheads, but I'm not very good with them. ((?=\d+)\d+)
would get me exclusively the number part on A100
but can't for the life of me manage to combine it with any other char string part.
All of this with an implementation that works with C# and .NET. Any guidance?
Upvotes: 2
Views: 193
Reputation: 19661
You may use the following pattern:
\b([A-Za-z]+(?:[A-Za-z0-9]*[A-Za-z_\-])?)(\d+)\b
Demo.
Details:
\b
- Word boundary.(
- Start of group 1.
[A-Za-z]+
- Match one or more letters.(?:
- Start of a non-capturing group.
[A-Za-z0-9]*
- Match zero or more alphanumeric characters.[A-Za-z_\-]
- Match a single letter, underscore, or hyphen.)?
Close the non-capturing group and make it optional.)
- Close group 1.(\d+)
- Match one or more digits and capture them in group 2.\b
- Word boundary.Note: It's not entirely clear from your question what characters are accepted. This assumes letters, digits, an underscore, and a hyphen. Feel free to add more characters in the appropriate character class if you need to support more.
Upvotes: 4