Reputation: 22440
I've written a script using vba in combination with IE to parse the contact information from a webpage applying regex on it. I searched a lot but could not find any example that can satiate my requirement. The pattern
may not be ideal to find the phone
number but the main concern here is how I can use the pattern
within vba IE.
Once again: my intention here is to parse the phone number 661-421-5861
from that webpage applying regex
within vba IE.
This is what I've tried so far:
Sub FetchItems()
Const URL$ = "https://www.nafe.com/bakersfield-nafe-network"
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim rxp As New RegExp, email As Object, Row&
With IE
.Visible = True
.navigate URL
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
With rxp
.Pattern = "(?<=Phone:)\s*?.*?([^\s]+)"
Set email = .Execute(HTML.body.innerText) 'I'm getting here an error
If email.Count > 0 Then
Row = Row + 1: Cells(Row, 1) = email.Item(0)
End If
End With
IE.Quit
End Sub
When I execute the above script I encounter an error method "Execute" of object "IRegExp2" failed when it hits the line containing Set email = .Execute(HTML.body.innerText)
. How can I make it a go successfully?
Upvotes: 2
Views: 158
Reputation: 12353
Here is a quicker way using xmlhttp object
Sub FetchItems()
Dim URL As String, strBody As String
Dim intS As Long, intE As Long
URL = "https://www.nafe.com/bakersfield-nafe-network"
Dim xml As Object
Set xml = CreateObject("MSXML2.XMLHTTP")
xml.Open "GET", URL, False
xml.send
Dim html As Object
Set html = CreateObject("htmlfile")
html.body.innerHTML = xml.responseText
strBody = html.body.innerHTML
intS = InStr(1, strBody, "Phone:", vbTextCompare) + Len("Phone:")
intE = InStr(intS, strBody, "<", vbTextCompare)
MsgBox Mid(strBody, intS, intE - intS)
End Sub
Upvotes: 2
Reputation: 627220
Note that lookbehinds are not supported by VBA regex. Here, you probably want to capture any digit followed with any amount of digits and hyphens after Phone:
.
You need to re-define the pattern as
rxp.Pattern = "Phone:\s*(\d[-\d]+)"
Then, you need to grab the first match and access its .SubMatches(0)
:
Set email = .Execute(HTML.body.innerText)
If email.Count > 0 Then
Cells(Row+1, 1) = email.Item(0).SubMatches(0)
End If
See the regex in action. The green-highlighted part of sting is what .SubMatches(0)
holds.
Pattern details
Phone:
- a literal substring\s*
- 0+ whitespaces(\d[-\d]+)
- Capturing group 1: a digit, followed with 1+ (due to +
, you may replace with *
to match zero or more) digits or/and hyphens.Upvotes: 2