T-Bone
T-Bone

Reputation: 21

RegExp other patterns not working

I continue trying to perform string format matching using RegExp in VBScript & VB6. I am now trying to match a short, single-line string formatted as:

  1. Seven characters:

    a. Six alphanumeric plus one "-" OR

    b. Five alphanumeric plus two "-"

  2. Three numbers

  3. Two letters
  4. Literal "65"
  5. A two-digit hex number.

Examples include 123456-789LM65F2, 4EF789-012XY65A5, A2345--789AB65D0 & 23456--890JK65D0.

The RegExp pattern ([A-Z0-9\-]{12})([65][A-F0-9]{2}) lumps (1) - (3) together and finds these OK.

However, if I try to:

c) Break (3) out w/ pattern ([A-Z0-9\-]{10})([A-Z]{2})([65][A-F0-9]{2}),

d) Break out both (2) & (3) w/ pattern ([A-Z0-9\-]{7})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}), or

e) Tighten up (1) with alternation pattern ([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})

it refuses to find any of them.

What am I doing wrong? Following is a VBScript that runs and checks these.

' VB Script
Main()

Function Main() ' RegEx_Format_sample.vbs
    'Uses two paterns, TestPttn for full format accuracy check & SplitPttn
    'to separate the two desired pieces

    Dim reSet, EtchTemp, arrSplit, sTemp
    Dim sBoule, sSlice, idx, TestPttn, SplitPttn, arrMatch 
    Dim arrPttn(3), arrItems(3), idxItem, idxPttn, Msgtemp

    Set reSet = New RegExp
    ' reSet.IgnoreCase = True ' Not using
    ' reSet.Global = True ' Not using

    ' load test case formats to check & split
    arrItems(0) = "0,6 nums + 1 '-',123456-789LM65F2" 
    arrItems(1) = "1,6 chars + 1 '-',4EF789-012XY65A5"
    arrItems(2) = "2,5 chars + 2 '-',A2345--789AB65D0"
    arrItems(3) = "3,5 nums + 2 '-',23456--890JK65D0"

    SplitPttn = "([A-Z0-9]{5,6})[-]{1,2}([A-Z0-9]{9})" ' split pattern has never failed to work

    ' load the patterns to try
    arrPttn(0) =  "([A-Z0-9\-]{12})([65][A-F0-9]{2})"
    arrPttn(1) =  "([A-Z0-9\-]{10}[A-Z]{2})([65][A-F0-9]{2})"
    arrPttn(2) =  "([A-Z0-9\-]{7})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})"
    arrPttn(3) =  "([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2})"

    For idxPttn = 0 To 3 ' select Test pattern
        TestPttn = arrPttn(idxPttn)

        TestPttn = TestPttn & "[%]" ' append % "ender" char 
        SplitPttn = SplitPttn & "[%]" ' append % "ender" char 

        For idxItem = 0 To 3
            reSet.Pattern = TestPttn ' set to Test pattern
            sTemp = arrItems(idxItem )
            arrSplit = Split(sTemp, ",")  '  arrSplit is Split array
            EtchTemp = arrSplit(2) & "%" ' append % "ender" char to Item sub (2) as the "phrase" under test

            If reSet.Test(EtchTemp) = False Then
                MsgBox("RegEx " & TestPttn & " false for " & EtchTemp & " as " & arrSplit(1) )
            Else ' test OK; now switch to SplitPttn 
                reSet.Pattern = SplitPttn 
                Set arrMatch = reSet.Execute(EtchTemp) ' run Pttn as Exec this time
                If arrMatch.Count > 0 then ' If test OK then Count s/b > 0 
                    Msgtemp = ""
                    Msgtemp = "RegEx " & TestPttn & " TRUE for " & EtchTemp & " as " & arrSplit(1) 
                    For idx = 0 To arrMatch.Item(0).Submatches.Count - 1
                        Msgtemp = Msgtemp & Chr(13) & Chr(10) & "Split segment " & idx & " as " & arrMatch.Item(0).submatches.Item(idx) 
                    Next
                    MsgBox(Msgtemp)
                End If ' Count OK
            End If ' test OK
        Next ' idxItem 
    Next  ' idxPttn 
End Function

Upvotes: 2

Views: 175

Answers (3)

T-Bone
T-Bone

Reputation: 21

All, tanx again for your help!!

trincot, everything in each arrItems() between the commas, incl the the "plus", is merely part of a shorthand description of each item's characteristics, such as "5 characters plus 2 dashes".

Gurman, your pttn breakdowns were helpful, but, if I read it right, the addition of the ? prefix is a "Match zero or one occurrences" and this must match exactly one occurrence. Also, my 1st pattern (matches 12) actually DID work for all my test cases.

jNevill, & JMichelB your suggestions are very close to what I ended up with.
I was "over-classing". After some tinkering, I was able to get the Test Pttn to successfully recognize these test cases by taking the [65] out of the [] in my original Alternation pattern. That is I went from ([65]) to (65) and Zammo! it worked.

Orig pattern: ([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}) Wkg pattern: ([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})(65)([A-F0-9]{2})

Oh, and I moved the SplitPttn = SplitPttn & "[%]" ' append % "ender" char stmt up out of the For...Next loop. That helped w/ the splitting.

T-Bone

Upvotes: 0

Gurmanjot Singh
Gurmanjot Singh

Reputation: 10360

Try this Regex:

(?:[A-Z0-9]{6}-|[A-Z0-9]{5}--)[0-9]{3}[A-Z]{2}65[0-9A-F]{2}

Click for Demo

Explanation:

  • (?:[A-Z0-9]{6}-|[A-Z0-9]{5}--) - matches either 6 Alphanumeric characters followed by a - or 5 Alphanumeric characters followed by a --
  • [0-9]{3} - matches 3 Digits
  • [A-Z]{2} - matches 2 Letters
  • 65 - matches 65 literally
  • [0-9A-F]{2} - matches 2 HEX symbols

You can get some idea from the following code:

VBScript Code:

Option Explicit
Dim objReg, strTest
strTest = "123456-789LM65F2"          'Change the value as per your requirements. You can also store a list of values in an array and run the code in loop
set objReg = new RegExp
objReg.Global = True
objReg.IgnoreCase = True
objReg.Pattern = "(?:[A-Z0-9]{6}-|[A-Z0-9]{5}--)[0-9]{3}[A-Z]{2}65[0-9A-F]{2}"
if objReg.test(strTest) then
    msgbox strTest&" matches with the Pattern"
else
    msgbox strTest&" does not match with the Pattern"
end if
set objReg = Nothing

Your patterns do not work because:

([A-Z0-9\-]{12})([65][A-F0-9]{2}) - matches 12 occurrences of either an AlphaNumeric character or - followed by either 6 or 5 followed by 2 HEX characters

([A-Z0-9\-]{10}[A-Z]{2})([65][A-F0-9]{2}) - matches 10 occurrences of either an AlphaNumeric character or - followed by 2 Letters followed by either 6 or 5 followed by 2 HEX characters

([A-Z0-9\-]{7})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}) - matches 7 occurrences of either an AlphaNumeric character or - followed by 3 digits followed by 2 Letters followed by either 6 or 5 followed by 2 HEX characters

([A-Z0-9]{5}[-]{2}|[A-Z0-9]{6}[-]{1})([0-9]{3})([A-Z]{2})([65][A-F0-9]{2}) - matches either 5 occurrences of an AlphaNumeric character followed by -- or 6 occurrences of an Alphanumeric followed by a -. This is then followed by 3 digits followed by 2 Letters followed by either 6 or 5 followed by 2 HEX characters

Upvotes: 1

JMichelB
JMichelB

Reputation: 475

Try this pattern :

(([A-Z0-9]{5}--)|([A-Z0-9]{6}-))[0-9]{3}[A-Z]{2}65[0-9A-F]{2}

Or, if the last part doesn't like the [A-F]

(([A-Z0-9]{5}--)|([A-Z0-9]{6}-))[0-9]{3}[A-Z]{2}65[0-9ABCDEF]{2}

Upvotes: 0

Related Questions