Reputation: 22440
I've written a script in vba
in combination with regular expressions
to parse company name
, phone
and fax
from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions
and to make them go successfully I created three different regex objects
, as in rxp
,rxp1
, and rxp2
.
My question: how can I create one regex object
within which I will be able to use three patterns
unlike what I've done below?
This is the script (working one):
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "Company Name:(\s[\w\s]+)"
rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
rxp2.Pattern = "Fax:(\s\+[\d\s]+)"
If rxp.Execute(.responseText).Count > 0 Then
[A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp1.Execute(.responseText).Count > 0 Then
[B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
End If
If rxp2.Execute(.responseText).Count > 0 Then
[C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
End If
End With
End Sub
Reference to add to the library to execute the above script:
Microsoft XML, v6.0
Microsoft VBScript Regular Expressions
Upvotes: 2
Views: 3645
Reputation: 22440
I think the following can help do the same declaring rxp
once:
Sub GetInfo()
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim Http As New XMLHTTP60, rxp As New RegExp
With Http
.Open "GET", Url, False
.send
End With
With rxp
.Pattern = "Company Name:(\s[\w\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[A1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Phone:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[B1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
.Pattern = "Fax:(\s\+[\d\s]+)"
If .Execute(Http.responseText).Count > 0 Then
[C1] = .Execute(Http.responseText)(0).SubMatches(0)
End If
End With
End Sub
Upvotes: 0
Reputation: 5308
You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.
If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.
Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.
Unless the data you are searching is related or depends within each other, I wouldn't go to that route.
Upvotes: 0
Reputation: 627199
You may build a regex with alternatives, enable global matching with rxp.Global = True
, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:
Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String
With New XMLHTTP60
.Open "GET", Url, False
.send
rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
rxp.Global = True
Set ms = rxp.Execute(.responseText)
For Each m In ms
If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
Next
Debug.Print cname, phone, fax
End With
Output:
Vaucraft Braford Stud +61 7 4942 4859 +61 7 4942 0618
See the regex demo.
Pattern details:
(Phone|Company Name|Fax)
- Capturing group 1: any of the three alternatives:\s*
- a colon and then 0+ whitespaces(\+?[\w\s]*\w)
- Capturing group 2:
\+?
- an optional +
[\w\s]*
- 0 or more letters, digits, _
or whitespaces\w
- a single letter, digit or _
.Upvotes: 4
Reputation: 1778
Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n?
will capture it into three capture groups. You can see how it works here.
Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.
Upvotes: 0