user2348235
user2348235

Reputation: 57

regex doesn't get street name

I'm parsing a text like this:

T-SHIRT SIZE 34CM BUSINESS LOCATED: MONTANA 356

I have made this regex:

([A-Z]+) (\d\d\d\d\d|\d\d\d\d|\d\d\d|\d\d)

It matches:

SIZE 34

But I want it to match:

MONTANA 356

Can you help me get it?

To be more explicit: I want to avoid matching "size 34" because is followed by a character... I would like the regex to do matches only when there are ' ' or \n after the wanted-to-be-matched string

Upvotes: 0

Views: 220

Answers (3)

Lars Eriks
Lars Eriks

Reputation: 37

Happens I try to learn some Regular Expression in VBA Excel. It's impossible to answer if you don’t provide the code for your RegEx. In VBA the pattern do match both "size 34" and "Montana 356". First and second position in the MatchCollection array. Could it be that you only return the first match?

'*** /update/ I use this as a test function.

Function RegExpTest(patrn As String, strTest As String) As Variant
Dim regex As New VBScript_RegExp_55.RegExp
Dim Match As Match, Matches As MatchCollection
Dim cnt As Integer, cmb() As Variant
If patrn <> "" Then
    With regex
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = patrn
    End With
    If regex.test(strTest) Then
        Set Matches = regex.Execute(strTest)
        cnt = Matches.Count
        ReDim cmb((cnt * 3) - 1)
        Dim i As Integer: i = 0
        For Each Match In Matches
            cmb(i) = " m:" & Match.Value & ","
            i = i + 1
            cmb(i) = "i:" & Match.FirstIndex & ","
            i = i + 1
            cmb(i) = "c:" & Match.Length & " |"
            i = i + 1
'            cmb(i) = "sub:" & Match.SubMatches.Count & "|"
'            i = i + 1
        Next
        RegExpTest = Join(cmb)
    Else
        RegExpTest = 0
    End If
End If
Set regex = Nothing
End Function

Upvotes: 1

Pravin Umamaheswaran
Pravin Umamaheswaran

Reputation: 704

Can you try using this expression? ([\w]+)\s(\d\d\d\d\d|\d\d\d\d|\d\d\d|\d\d)\b

Upvotes: 1

Filkolev
Filkolev

Reputation: 460

Here is a modification that should work: ([A-Za-z]+) \b(\d{2,5})\b

You need to specify what symbols are valid for the name (I included upper- and lower-case letters). I also use a short-hand to specify between 2 and 5 digits.

The critical part is surrounding the number with word boundaries, \b. Does this solve your issue?

Upvotes: 1

Related Questions