M. T.
M. T.

Reputation: 539

Regular Expression with Split in VB.NET

I want to use split and regular expressions together to separate special codes in a line. This is my line:

14S15T3C16W17A0-20m0-7T

Now I want to separate out each item, and the items could be for e.g. 14S, 15T, 7T, etc. It consists of random length of digits and one single alphabet after that digit:

E.g.: 125125125125125X or 11T.

There is also an exception which is the 0- and these will remain as they are, and must be separated out too.

I have made a regular expression myself:

Dim digits() As String = Regex.Split(line, "([0-9][A-Z]|0-)")

But the problem is that it only takes 1 digit of the combination, for example, if the line is 11T2B13D, it will separate it like this: 1, 1T, 2B, 1, 3D

How can I solve this problem?

Upvotes: 3

Views: 3267

Answers (2)

nhahtdh
nhahtdh

Reputation: 56829

Since there will be a single alphabet character or a slash - (for the case of 0-) that ends each token, it can be split using Regex.Split with this regex:

(?<=[-a-zA-Z])

(?<=pattern) is zero-width (text not consumed) positive look-behind, and it will match if the text before the current position matches the pattern inside.

The regex above just checks that the character before the current position is alphabet (upper or lower case) a-zA-Z or a dash -, and split at the current position.


Alternatively, you can do this with Regex.Matches with this regex:

[0-9]+[A-Za-z]|0-

Since the number can be arbitrary long, you need the 1 or more quantifier +. The rest should be clear, since it is very close to what you have tried.


Both method should have the same effect for valid input (according to your specification). However, when the input is invalid, Regex.Split approach will produce invalid tokens, while Regex.Matches approach produces valid tokens (it will skip invalid character/sequences).

Upvotes: 3

Rob
Rob

Reputation: 2654

If you goal is to split 11T2B13D into 11T 2B 13D, then you need to change your regular expression that it is 1 or more numbers. so use [0-9]+[A-Z]|0-, if the + operator (which means one of more) does not exist try this instead [0-9][0-9][A-Z]|0-. ( means zero or more).

Upvotes: 0

Related Questions