Dave Stuart
Dave Stuart

Reputation: 547

Regex for Uppercase Letters, Numbers and dashes only

I have struggled with this expression for 2 days now so I thought I'd ask for some proper help from the world of knowledge. I hope someone can help.

This is the RegEx I built to get me what I want.

\S*\d*?-[A-Z]*[0-9]*

I only want the Uppercase Letters and Numbers with dashes, so it does get GC-113, AO-1-GC-113, AO-2-GC-113, which is great!

"I don't want this ------, but this is good GC-113, AO-1-GC-113, AO-2-GC-113"

BUT if I come across one where there is no space between the number, but just another character like a comma or a period then it returns a match on the entire section "GC-113,AO-1-GC-113,AO-2-GC-113"

"I don't want this ------, but this is good GC-113,AO-1-GC-113,AO-2-GC-113"

I'm using RegExBuddy to try and figure this out.

This is the VBA code I'm using get the matches.

Public Function GetRIs(ByVal vstrInString As String) As Collection
Dim myRegExp As RegExp
Dim myMatches As Variant
Dim myMatch As Variant

Set GetRIs = New Collection
Set myRegExp = New RegExp

myRegExp.Global = True
myRegExp.Pattern = "\S*\d*?-[A-Z]*[0-9]*"
Set myMatches = myRegExp.Execute(vstrInString)

For Each myMatch In myMatches
    If myMatch.Value <> "" Then
        GetRIs.Add myMatch.Value
    End If
Next

End Function

Thanks! Dave

Upvotes: 1

Views: 6677

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627380

Your \S*\d*?-[A-Z]*[0-9]* pattern can even match a single hyphen as only - is obligatory and the rest of subpatterns can match zero times (can be absent from the string).

You can use

myRegExp.Pattern = "\b[A-Z0-9]+(?:-[A-Z0-9]+)+"

The pattern matches:

  • \b - a word boundary (before the next letter or digit there must be a non-word character or start of string
  • [A-Z0-9]+ - one or more letters or digits
  • (?:-[A-Z0-9]+)+ - 1 or more sequences of:
    • - - a hyphen
    • [A-Z0-9]+ - one or more letters or digits

Upvotes: 2

Related Questions