Andrey Rassanov
Andrey Rassanov

Reputation: 21

Parse data from internet HTML page using VBA and IE

Diclosure: I'm just an office clerk and very-very new to VBA and HTML. Hopefully you will be patient with me. I would really appreciate any guidance and help. Hopefully I'm formatting it correctly.

I spent whole day yesterday trying to import information from intranets web-page to automate routing copying and pasting. This will really help in the long run.

Since power-query doesn't seem to see the table I need, I figured the only option is using VBA. MsServer tool grabs page perfectly, but to my disappointment page came out with error, since it needs to authorize before access it first.

I figured with the use of IE it should work, since IE have login information in cookies.

Here where I got so far.

 Sub ExtractFromEndeca() Dim ie As InternetExplorer Dim html As
 IHTMLDocument Set ie = CreateObject("InternetExplorer.Application")
 ie.Visible = False 
 ie.Navigate "intranet address"
 While ie.Busy
     DoEvents Wend While ie.ReadyState < 4
     DoEvents Wend
     Set Doc = CreateObject("htmlfile")
     Set Doc = ie.document
     Set Data = Doc.getElementById("findSimilarOptions2")
     Sheet1.Cells(1, 1) = Data
     ie.Quit Set ie = Nothing

 ThisWorkbook.Sheets(1).Cells(1, 1) = Data

 End Sub

Result is [object] in Cell A1 and that's it and I can't understand if I got past login or not.

Here is a page fragment I'm trying to grab. Ideally this data will be output as a table.

   <td valign="top" id="findSimilarOptions2">
<div class="subtitle">Part Attributes</div>
    <input type="checkbox" id="n_200012" value="-19192896" NAME="n_200012">
    <b>
    ASSY TYPE</b>&nbsp;>
    Component<br>

    <input type="checkbox" id="n_200013" value="-18148519" NAME="n_200013">
    <b>
    PARAMETER I NEED(1)</b>&nbsp;>
    VALUE I NEED(1)<br>

    <input type="checkbox" id="n_200006" value="-20823731" NAME="n_200006">
    <b>
    PARAMETER I NEED(2)</b>&nbsp;>
    VALUE I NEED(2)<br>

    <input type="checkbox" id="n_200006" value="-20823618" NAME="n_200006">
    <b>
    PARAMETER I NEED(3)</b>&nbsp;>
    VALUE I NEED(3)<br>

    <input type="checkbox" id="n_200006" value="-20823586" NAME="n_200006">
    <b>
    PARAMETER I NEED(4)</b>&nbsp;>
    VALUE I NEED(4)<br>
    ...

Upvotes: 2

Views: 374

Answers (1)

Zwenn
Zwenn

Reputation: 2267

Please read my comments in the following code:

'Use the following line in every module head
'It forces you to define all variables
Option Explicit

Sub ExtractFromEndeca()

Dim ie As InternetExplorer
Dim doc As IHTMLDocument 'You don't use html in your code, but doc
Dim data As HTMLHtmlElement 'You should define all variables
Dim singleData As HTMLHtmlElement 'New variable
Dim row As Long 'New variable

  row = 1 'First row for output in Excel table

  'Set ie = CreateObject("InternetExplorer.Application") 'This could be problematic on the intranet due to security guidelines
  Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}") 'Try this instead to initialize the IE
  ie.Visible = True 'This property should be True while development
  ie.Navigate "intranet address"
  'While ie.Busy: DoEvents: Wend 'You don't need this line
  While ie.ReadyState <> 4: DoEvents: Wend
  'Set Doc = CreateObject("htmlfile") 'You don't need this line
  Set doc = ie.document
  Set data = doc.getElementById("findSimilarOptions2").getElementsByTagName("input")

  'Data is only a reference to an object
  'You want the text information which lies in the value attributes of each input tag
  For Each singleData In data
    Sheet1.Cells(row, 1) = data.Value
    row = row + 1
  Next singleData

  'Clean up
  '(Automatic after development has finished)
  'ie.Quit
  'Set ie = Nothing
End Sub

Upvotes: 3

Related Questions