SIM
SIM

Reputation: 22440

How to use multiple patterns within one regex object?

I've written a script in vba in combination with regular expressions to parse company name, phone and fax from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions and to make them go successfully I created three different regex objects, as in rxp,rxp1, and rxp2.

My question: how can I create one regex object within which I will be able to use three patterns unlike what I've done below?

This is the script (working one):

Sub GetInfo()
    Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
    Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp

    With New XMLHTTP60
        .Open "GET", Url, False
        .send

        rxp.Pattern = "Company Name:(\s[\w\s]+)"
        rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
        rxp2.Pattern = "Fax:(\s\+[\d\s]+)"

        If rxp.Execute(.responseText).Count > 0 Then
            [A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
        End If

        If rxp1.Execute(.responseText).Count > 0 Then
            [B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
        End If

        If rxp2.Execute(.responseText).Count > 0 Then
            [C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
         End If
    End With
End Sub

Reference to add to the library to execute the above script:

Microsoft XML, v6.0
Microsoft VBScript Regular Expressions

Upvotes: 2

Views: 3645

Answers (4)

SIM
SIM

Reputation: 22440

I think the following can help do the same declaring rxp once:

Sub GetInfo()
    Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
    Dim Http As New XMLHTTP60, rxp As New RegExp

    With Http
        .Open "GET", Url, False
        .send
    End With

    With rxp
        .Pattern = "Company Name:(\s[\w\s]+)"
        If .Execute(Http.responseText).Count > 0 Then
            [A1] = .Execute(Http.responseText)(0).SubMatches(0)
        End If

        .Pattern = "Phone:(\s\+[\d\s]+)"
        If .Execute(Http.responseText).Count > 0 Then
            [B1] = .Execute(Http.responseText)(0).SubMatches(0)
        End If

        .Pattern = "Fax:(\s\+[\d\s]+)"
        If .Execute(Http.responseText).Count > 0 Then
            [C1] = .Execute(Http.responseText)(0).SubMatches(0)
        End If
    End With
End Sub

Upvotes: 0

Julio
Julio

Reputation: 5308

You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.

If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.

Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.

Unless the data you are searching is related or depends within each other, I wouldn't go to that route.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627199

You may build a regex with alternatives, enable global matching with rxp.Global = True, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:

Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String

With New XMLHTTP60
    .Open "GET", Url, False
    .send

    rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
    rxp.Global = True

    Set ms = rxp.Execute(.responseText)
    For Each m In ms
        If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
        If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
        If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
    Next

    Debug.Print cname, phone, fax
End With

Output:

Vaucraft Braford Stud       +61 7 4942 4859              +61 7 4942 0618

See the regex demo.

Pattern details:

  • (Phone|Company Name|Fax) - Capturing group 1: any of the three alternatives
  • :\s* - a colon and then 0+ whitespaces
  • (\+?[\w\s]*\w) - Capturing group 2:
    • \+? - an optional +
    • [\w\s]* - 0 or more letters, digits, _ or whitespaces
    • \w - a single letter, digit or _.

Upvotes: 4

emsimpson92
emsimpson92

Reputation: 1778

Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n? will capture it into three capture groups. You can see how it works here.

Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.

Upvotes: 0

Related Questions