Reputation: 27
I need help with a regex to prevent characters other than '-' and ',' from being recognised in conjunction with numbers.
I have a stream of data which consists of numbers 0 to 9 which are delimited with a comma but can never start with a 0 (could contain 10 but not 01). Some of these delimited packets can be hyphenated. An example of the groups of numbers are as follows:
12-34,56,78-90,12,34-45,67-8,90
I need to have the regex return groups for each comma delimited section, ie:
Group 1: 12-34
Group 2: 56
Group 3: 78-90
Group 4: 12
Group 5: 34-45
Group 6: 67-8
Group 7: 90
So far I have this pattern:
[1-9]+\d*(?:-[1-9]+\d*)?(?=,|$)
The problem is that if the numbers contain a spurious character other than a number, '-' or ',' the group is partially recognised:
12£34,56,78-90,12,34-45,67-8,90
Group 1: 34
How can I fix this? I am using vba for this. Thanks
Upvotes: 1
Views: 41
Reputation: 626738
You may use
(?:^|,)([1-9]\d*(?:-[1-9]\d*)?)(?=,|$)
^^^^^^^
See the regex demo and the regex graph:
The main point is the (?:^|,)
group that matches start of string (^
) or (|
) a comma. Note I removed +
from [1-9]
to lessen the amount of backtracking.
Details
(?:^|,)
- start of string or ,
([1-9]\d*(?:-[1-9]\d*)?)
- Capturing group 1 (access it via match.Submatches(0)
):
[1-9]\d*
- a digit from 1
to 9
and then any 0+ digits(?:-[1-9]\d*)?
- an optional sequence of -
, a digit from 1
to 9
and then any 0+ digits(?=,|$)
- comma or end of string.VBA test:
Sub Test()
Dim val As String, rx As New regExp
Dim ms As MatchCollection, m As Match
val = "12L34,56,78-90,12,34-45,67-8,90"
Set rx = New regExp
rx.Pattern = "(?:^|,)([1-9]\d*(?:-[1-9]\d*)?)(?=,|$)"
rx.Global = True
Set ms = rx.Execute(val)
If ms.Count() > 0 Then
For Each m In ms
Debug.Print m.SubMatches(0)
Next
End If
End Sub
Output:
Upvotes: 2
Reputation: 27723
My guess is that you might be just in search of a simple expression, such as
^(\d+-\d+|(?:\d+\D)\d+),(\d+),(\d+-\d+|(?:\d+\D)\d+),(\d+),(\d+-\d+|(?:\d+\D)\d+),(\d+-\d+|(?:\d+\D)\d+),(\d+)$
to maybe fulfill your grouping requirements from 1 to 7 and you can add an optional group for those \D
chars that you may have anywhere within your strings.
Upvotes: 0