MattV
MattV

Reputation: 1383

Read and manipulate HTML with Excel VBA

Let's say I have a page as follows, saved at c:\temp\html_page.html:

<html>
   <head>
      <link rel="stylesheet" href="styles.css">
   </head>
   <body>
      <div id="xxx1">
         <img src="test.png">
      </div>
   </body>
</html>

I'd like to programatically adjust the src attribute of the img, based on Excel data & VBA. Basically a way to find the div with Xpath, and adjust the (single) img tag that is contained in it.

I found an example for manipulating XML with VBA through the XML library here, but I've been crunching my head around making this work with the HTML object library; can't find any examples and/or documentation.

Dim XDoc As Object, root As Object

Set XDoc = CreateObject("MSXML2.DOMDocument")
XDoc.async = False: XDoc.validateOnParse = False

If XDoc.Load(html_path) Then
    Debug.Print "Document loaded"
Else
    Dim strErrText As String
    Dim xPE As MSXML2.IXMLDOMParseError
    ' Obtain the ParseError object
    Set xPE = XDoc.parseError
    With xPE
       strErrText = "Your XML Document failed to load" & _
         "due the following error." & vbCrLf & _
         "Error #: " & .ErrorCode & ": " & xPE.reason & _
         "Line #: " & .Line & vbCrLf & _
         "Line Position: " & .linepos & vbCrLf & _
         "Position In File: " & .filepos & vbCrLf & _
         "Source Text: " & .srcText & vbCrLf & _
         "Document URL: " & .URL
    End With
    MsgBox strErrText, vbExclamation

All I want to do is:

'...
Set outer_div = XDoc.SelectFirstNode("//div[id='xxx1'")
... edit the img attribute

But I can't load the HTML page, because it's not proper XML (img tag isn't closed).

Any help is greatly appreciated. Oh, and I can't use other languages such as Python, bummer.

Upvotes: 4

Views: 6549

Answers (2)

gembird
gembird

Reputation: 14053

For this purpose you can use doc.querySelector("div[id='xxx1'] img"). To change src attribute use img.setAttribute "src", "new.png". HTH

Option Explicit

' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library

Sub Demo()
    Dim ie As SHDocVw.InternetExplorer
    Dim doc As MSHTML.HTMLDocument
    Dim url As String

    url = "file:///C:/Temp/StackOverflow/html/html_page.html"
    Set ie = New SHDocVw.InternetExplorer
    ie.Visible = True
    ie.navigate url
    While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
    Set doc = ie.document

    Dim img As HTMLImg
    Set img = doc.querySelector("div[id='xxx1'] img")
    If Not img Is Nothing Then
        img.setAttribute "src", "new.png"
    End If
    ie.Quit
End Sub

Upvotes: 1

SWa
SWa

Reputation: 4363

This isn't quite what you want, but it may be close enough. Rather than using the XML library, use the HTML library:

Sub changeImg()

    Dim dom As Object
    Dim img As Object
    Dim src As String

    Set dom = CreateObject("htmlFile")

    Open "C:\temp\test.html" For Input As #1
        src = Input$(LOF(1), 1)
    Close #1

    dom.body.innerHTML = src

    Set img = dom.getelementsbytagname("img")(0)

    img.src = "..."

    Open "C:\temp\test.html" For Output As #1
        Print #1, dom.DocumentElement.outerHTML
    Close #1


End Sub

The issue is that the resulting file will have Head nodes added and the tag names will be in upper case. If you can live with this, the solution will work for you.

As an aside if you want to do things a bit more in depth, with better selectors consider early binding. The HTML interface exposed is different than the interface when late bound and supports more properties - you'll want to add a reference to the HTML Object Library:

Sub changeImg()

    Dim dom As HTMLDocument
    Dim img As Object
    Dim src As String

    Set dom = CreateObject("htmlFile")

    Open "C:\temp\test.html" For Input As #1
        src = Input$(LOF(1), 1)
    Close #1

    dom.body.innerHTML = src

    Set img = dom.getelementsbytagname("img")(0)

    img.src = "..."

    Open "C:\temp\test.html" For Output As #1
        Print #1, dom.DocumentElement.outerHTML
    Close #1


End Sub

Upvotes: 4

Related Questions