Josh Henderson
Josh Henderson

Reputation: 21

Excel VBA Scrape HTML table for text

Alright,

I've spent hours searching how to grab text from a table and for the life of me I havn't come across a method that worked in my case.

Here is a sample of the HTML that Im trying to get information from

<table class="empDetailCard foldable unfolded">
    <tr>
        <td colspan="4" class="title">
            <span class="fold-control">full name</span>
        </td>
    </tr>
    <tr class="fold-row">
        <td>
            <div class="badgePhoto reg">
    <img class="photo " src="removed" />
</div>
</td>
        <td>
           <span class="line">
               <span class="section-title">Employee Info</span>
           </span>
           <div class="employeeInfo">
               <div>
                   <span class="line">
                       <span class="row-label">Login</span>
                       mylogin</span>
                   <span class="line">
                       <span class="row-label">Empl ID</span>
                       1234567</span>
                   <span class="line">
                       <span class="row-label">Badge</span>
                       1234567</span>
                   <span class="line">
                       <span class="row-label">Dept ID</span>
                       1234567</span>
                   <span class="line">
                       <span class="row-label">Location</span>
                       1234567
                       </span>
                   <span class="line">
                       <span class="row-label">Manager</span>
                       <a href="removed" 
                    title="">John, Smith</a>
                    </span>
        </td>
    </tr>
</table>

I've tried to grab "mylogin" from the Login table using GetElementByID, GetElementByName, and even regex but I've had no luck.

Function IdtoLogin(empID As String)
     Dim H As Object, html As Object, objResult As Object
     Set H = CreateObject("WinHttp.WinHttpRequest.5.1")
     H.Open "GET", "myurl" & empID
     H.setRequestHeader "Content-Type", "text/xml"
     H.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
     H.SetAutoLogonPolicy 0
     H.send

     Set html = New HTMLDocument
     html.Body.innerHTML = H.ResponseText
     Set objResult = html.GetElementById("Login")
     IdtoLogin = objResult.innerHTML

End Function

The response returns the correct HTML with the login information but it fails to get the element ID and throws a "runtime error 91". If someone could point out the obvious for me thatd be great because Im going crazy.

Upvotes: 2

Views: 738

Answers (1)

QHarr
QHarr

Reputation: 84465

Try a CSS selector

html.querySelector("div.employeeInfo span")

The info you want maybe part of outerHTML for example. By using singular querySelector you get the first node match which, in the HTML shown, is mylogin:

CSS query

Upvotes: 1

Related Questions