Reputation: 539
I want to use split and regular expressions together to separate special codes in a line. This is my line:
14S15T3C16W17A0-20m0-7T
Now I want to separate out each item, and the items could be for e.g. 14S, 15T, 7T, etc. It consists of random length of digits and one single alphabet after that digit:
E.g.: 125125125125125X or 11T.
There is also an exception which is the 0- and these will remain as they are, and must be separated out too.
I have made a regular expression myself:
Dim digits() As String = Regex.Split(line, "([0-9][A-Z]|0-)")
But the problem is that it only takes 1 digit of the combination, for example, if the line is 11T2B13D, it will separate it like this: 1, 1T, 2B, 1, 3D
How can I solve this problem?
Upvotes: 3
Views: 3267
Reputation: 56829
Since there will be a single alphabet character or a slash -
(for the case of 0-
) that ends each token, it can be split using Regex.Split
with this regex:
(?<=[-a-zA-Z])
(?<=pattern)
is zero-width (text not consumed) positive look-behind, and it will match if the text before the current position matches the pattern
inside.
The regex above just checks that the character before the current position is alphabet (upper or lower case) a-zA-Z
or a dash -
, and split at the current position.
Alternatively, you can do this with Regex.Matches
with this regex:
[0-9]+[A-Za-z]|0-
Since the number can be arbitrary long, you need the 1 or more quantifier +
. The rest should be clear, since it is very close to what you have tried.
Both method should have the same effect for valid input (according to your specification). However, when the input is invalid, Regex.Split
approach will produce invalid tokens, while Regex.Matches
approach produces valid tokens (it will skip invalid character/sequences).
Upvotes: 3
Reputation: 2654
If you goal is to split 11T2B13D into 11T 2B 13D, then you need to change your regular expression that it is 1 or more numbers. so use [0-9]+[A-Z]|0-, if the + operator (which means one of more) does not exist try this instead [0-9][0-9][A-Z]|0-. ( means zero or more).
Upvotes: 0