Reputation: 13
I am using VBA for web scraping. Below is the html structure and my VBA code.
When I run it I am getting this text ETA : 2020-08-26 (Reference only, the date will be updated according to shipments).
But I want to scrape only the date from it 2020-08-26
<div style="font-size: 14px;">
<span class="label" style="font-weight: bolder; font-size: 13px;">ETA : </span>
<br>
2020-08-26
<span style="color: red; font-size: 12px;">(Reference only, the date will be updated according to
shipments).</span>
</div>
VBA Code>
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
ie.navigate "http://127.0.0.1/wordpress/sample-page/"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set elems = doc.getElementsByTagName("div")
MsgBox elems(33).innerText
Upvotes: 1
Views: 1335
Reputation: 166331
Dim html, divs, d, c
Set html = CreateObject("htmlfile")
html.body.innerHTML = "<div style='font-size: 14px;'><span class='label' style='font-weight: bolder; font-size: 13px;'>ETA : </span>" & _
"<br>2020-08-26" & _
"<span style='color: red; font-size: 12px;'>(Reference only, the date will be updated according toshipments).</span>" & _
"</div>"
Set divs = html.getElementsByTagName("div")
For Each d In divs
For Each c In d.ChildNodes
Debug.Print TypeName(c), c.nodeName, c.NodeValue
Next c
Next d
output:
HTMLSpanElement SPAN Null
HTMLBRElement BR Null
DispHTMLDOMTextNode #text 2020-08-26
HTMLSpanElement SPAN Null
Upvotes: 1
Reputation: 4467
This code finds any date of the form ####-##-##
.
Cells.Clear
s = "ETA : 2020-08-26 (Reference only, the date will be updated according to shipments)."
ReDim a(1 To Len(s))
For i = 1 To Len(s)
a(i) = IIf(Mid(s, i, 1) Like "#", "#", Mid(s, i, 1))
Next i
fd = "####-##-##"
Cells(1, 1) = s
aa = Join(a, "")
Cells(2, 1) = aa
Cells(3, 1) = Mid(s, InStr(aa, fd), Len(fd))
Cells(3, 1).NumberFormat = "yyyy-mm-dd"
First it splits the string into an array, and replaces all digits with a #
. Then it uses InStr
to find a match to the pattern template fd
, and uses the return value from the match to return the actual date.
Upvotes: 1
Reputation: 11755
Once you have the string, you can just use a combination of Instr
, Mid
, and Trim
to get the date:
Sub test()
Dim sSource As String
Dim nStart As Integer
Dim nEnd As Integer
Dim sResult As String
Dim dtDate As Date
sSource = "ETA : 2020-08-26 (Reference only, the date will be updated according to shipments)"
nStart = InStr(sSource, ":")
nEnd = InStr(sSource, "(")
sResult = Trim$(Mid$(sSource, nStart + 1, nEnd - nStart - 1))
If IsDate(sResult) Then
dtDate = CDate(sResult)
MsgBox "Success: " & dtDate
Else
MsgBox sResult & " is not a date"
End If
End Sub
Upvotes: 1
Reputation: 2267
You can do that by string manipulation or by the path through the DOM. Here is the solution with the path.
Sub SelectFromDropdown()
Dim url As String
Dim browser As Object
Dim nodeDiv As Object
url = "Your URL Here"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.readyState = 4: DoEvents: Loop
'Istead of (0) it's (33) in your code
'However, I do not recommend the use of such high indices,
'as they can lead to unstable behaviour. Just add a div tag
'before the index and the macro will not work anymore. This
'does not apply if you loop through an HTML section that has
'been selected as a container of exactly these div tags.
Set nodeDiv = browser.document.getElementsByTagName("div")(0)
'To get only the date you can go through the DOM path
'You want a text node of the DOM (Document Object Model)
'So innertext doesn't work. You need the NodeValue
MsgBox nodeDiv.FirstChild.NextSibling.NextSibling.NextSibling.NextSibling.NodeValue
End Sub
Upvotes: 0