Michael Green
Michael Green

Reputation: 27

What can I do to prevent certain charaters being matched in regex?

I need help with a regex to prevent characters other than '-' and ',' from being recognised in conjunction with numbers.

I have a stream of data which consists of numbers 0 to 9 which are delimited with a comma but can never start with a 0 (could contain 10 but not 01). Some of these delimited packets can be hyphenated. An example of the groups of numbers are as follows:

12-34,56,78-90,12,34-45,67-8,90

I need to have the regex return groups for each comma delimited section, ie:

Group 1: 12-34

Group 2: 56

Group 3: 78-90

Group 4: 12

Group 5: 34-45

Group 6: 67-8

Group 7: 90

So far I have this pattern:

[1-9]+\d*(?:-[1-9]+\d*)?(?=,|$)

The problem is that if the numbers contain a spurious character other than a number, '-' or ',' the group is partially recognised:

12£34,56,78-90,12,34-45,67-8,90

Group 1: 34

How can I fix this? I am using vba for this. Thanks

Upvotes: 1

Views: 41

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may use

(?:^|,)([1-9]\d*(?:-[1-9]\d*)?)(?=,|$)
^^^^^^^

See the regex demo and the regex graph:

enter image description here

The main point is the (?:^|,) group that matches start of string (^) or (|) a comma. Note I removed + from [1-9] to lessen the amount of backtracking.

Details

  • (?:^|,) - start of string or ,
  • ([1-9]\d*(?:-[1-9]\d*)?) - Capturing group 1 (access it via match.Submatches(0)):
    • [1-9]\d* - a digit from 1 to 9 and then any 0+ digits
    • (?:-[1-9]\d*)? - an optional sequence of -, a digit from 1 to 9 and then any 0+ digits
  • (?=,|$) - comma or end of string.

VBA test:

Sub Test()
Dim val As String, rx As New regExp
Dim ms As MatchCollection, m As Match

val = "12L34,56,78-90,12,34-45,67-8,90"
Set rx = New regExp
rx.Pattern = "(?:^|,)([1-9]\d*(?:-[1-9]\d*)?)(?=,|$)"
rx.Global = True
Set ms = rx.Execute(val)
If ms.Count() > 0 Then
 For Each m In ms
   Debug.Print m.SubMatches(0)
 Next
End If

End Sub

Output:

enter image description here

Upvotes: 2

Emma
Emma

Reputation: 27723

My guess is that you might be just in search of a simple expression, such as

^(\d+-\d+|(?:\d+\D)\d+),(\d+),(\d+-\d+|(?:\d+\D)\d+),(\d+),(\d+-\d+|(?:\d+\D)\d+),(\d+-\d+|(?:\d+\D)\d+),(\d+)$

to maybe fulfill your grouping requirements from 1 to 7 and you can add an optional group for those \D chars that you may have anywhere within your strings.

DEMO

Upvotes: 0

Related Questions