bentolomew
bentolomew

Reputation: 1

How to extract a string from a htmldiv element without ID using VBA

I am new in VBA and trying to automate the following:

  1. Open a webpage
  2. Input a string from my excel sheet into a search window, click search
  3. Click on the only result
  4. Find and extract the email adress which is connected to the person I found and copy into excel sheet

I have come to the point where I have found the person, but cannot get the email as it is stored in a htmldiv element which I cannot adress correctly. As there is no ID, I am trying to somehow target the place where the email adress is stored (see enclosed picture). I have tried different things but the I cannot get values out of the Email_search object. Further variables stay empty. Storing data into the excel sheet works with other data from the webpage.

Sorry if this is trivial but I am stuck and would apprechiate your help. For testing purposes you can replace the value for "input_element.Value" with "Dipl.-Ing.Univ. Ali Riza Acer". Here is my code:

Sub Test1()

    Dim IE As Object
    'Dim doc As HTMLDocument
    Set IE = CreateObject("InternetExplorer.Application")

    IE.Visible = True
    IE.navigate "https://www.bayika.de/de/ingenieursuche/"
    
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
            
    Set doc = IE.document
    
    'Get string from excel and put into search window
       
    Set the_input_elements = IE.document.getElementsByName("suchwort")
        For Each input_element In the_input_elements
        If input_element.getAttribute("name") = "suchwort" Then
        input_element.Value = ThisWorkbook.Sheets("Sortiert").Range("B2").Value
    Exit For
    End If
    Next input_element
    
    
    'Press search button
    
    Set the_input_elements2 = IE.document.getElementsByTagName("button")
    For Each input_element2 In the_input_elements2
        input_element2.Click
    Exit For
    'End If
    Next input_element2
    
    'Wait until webpage has loaded
    
    Do While IE.Busy
        Application.Wait DateAdd("s", 1, Now)
    Loop
    
    'Click the only list entry
    
    Set the_input_elements3 = IE.document.getElementsByClassName("listEntry listEntryClickable listEntryClickableJS")
    For Each input_element3 In the_input_elements3
        input_element3.Click
    Exit For
    Next input_element3

'%%%%%%%%%%%%%% From here the code does not work %%%%%%%%%%%%
    
            'Find email and save - Variant 1: Look for a and target line, doesn't work because it cannot get the value out
                                            
            'Set the_input_elements5 = IE.document.getElementsByTagName("a")(57)
                                                
                                                               
            'ThisWorkbook.Sheets("Sortiert").Range("F2").Value = the_input_elements5
                                                                                  
                                                
     Count = 0

    Set Email_search = IE.document.getElementsByClassName("elementStandard elementContent elementContainerStandard elementContainerStandard_var1 elementContainerStandardColumns elementContainerStandardColumns2 elementContainerStandardColumns_var5050 wglAdjustHeightMax")
                  
        For Each Email_element In Email_search
                  
            Email = Email_search.getElementsByClassName("col col1")
            
            Var = 1
            Count = Count + Var
        
            If InStr(1, Email, "mailto") > 0 Then
                ThisWorkbook.Sheets("Sortiert").Range("F2").Value = Email
                Exit For
            End If
        Next Email_element

End Sub

This is the part of the website I would like to extract:

Code part from webpage

Thank you very much!

Upvotes: 0

Views: 215

Answers (1)

SIM
SIM

Reputation: 22440

Okay, the following should do it. I've used xmlhttp request (the fastest method) instead of IE.

Sub GetInformation()
    Const baseUrl = "https://www.bayika.de"
    Const URL = "https://www.bayika.de/de/ingenieursuche/suchergebnis.php?"
    
    Dim oHttp As Object, Html As HTMLDocument, sParams As String
    Dim MyDict As Object, DictKey As Variant, oElem As Object
    Dim InnerPageUrl As String, nameToSearch As String
    
    Set oHttp = CreateObject("MSXML2.XMLHTTP")
    Set Html = New HTMLDocument
    Set MyDict = CreateObject("Scripting.Dictionary")
    
    nameToSearch = "Dipl.-Ing.Univ. Ali Riza Acer"  'This is the variable holding your search term

    MyDict("suchwort") = nameToSearch
    MyDict("plz_bis") = ""
    MyDict("plz_von") = ""
    
    For Each DictKey In MyDict
        sParams = IIf(Len(DictKey) = 0, WorksheetFunction.encodeURL(DictKey) & "=" & WorksheetFunction.encodeURL(MyDict(DictKey)), _
                        sParams & "&" & WorksheetFunction.encodeURL(DictKey) & "=" & WorksheetFunction.encodeURL(MyDict(DictKey)))
    Next DictKey


    With oHttp
        .Open "GET", URL & sParams, False
        .send
        Html.body.innerHTML = .responseText
    End With
    
    Set oElem = Html.querySelector("#list_ingenieursuche li.listEntry")
    If Not oElem Is Nothing Then
        InnerPageUrl = baseUrl & Split(Split(oElem.getAttribute("onclick"), "href='")(1), "'")(0)

        With oHttp
            .Open "GET", InnerPageUrl, False
            .send
            Html.body.innerHTML = .responseText
        End With
        
        MsgBox Html.querySelector("#blockContentInner a[href*='mailto:']").innerText
    End If
End Sub

Reference to add:

Microsoft HTML Object Library

Note: If for some reason WorksheetFunction.encodeURL() throws any error on your end, it might be because of the variation of excel versions. I'm using excel 2013 by the way.

Upvotes: 1

Related Questions