vitruvius
vitruvius

Reputation: 21119

InnerText returns empty for a specific span class

I am trying to retrieve regular (126,37€) and reduced (101,10€) price information from this website.

Simplified HTML code looks like this:

<div class="vw-productFeatures ">
  <ul class="feature-list -price-container">
    <li class="feature -price">
      <span class="value">126,37</span>
    </li>
  </ul>
  <ul class="feature-list vw-productVoucher">
    <li class="voucher-information">Mit Code
      <span class="voucher-reduced-price">101,10</span>
    </li>
  </ul>
</div>

So, I basically go step by step (div class -> ul class -> li class -> span class) and get the innerText at the end.

I am able to get the regular price, however, spanclass.innerText of reduced price returns empty.

This is the code I am working with:

Function getHTMLelemFromCol(HTMLColIn As MSHTML.IHTMLElementCollection, tagNameIn As String, classNameIn As String) As MSHTML.IHTMLElement
    Dim HTMLitem As MSHTML.IHTMLElement

    For Each HTMLitem In HTMLColIn
        If (HTMLitem.tagName = tagNameIn) Then
            If (HTMLitem.className = classNameIn) Then
                Set getHTMLelemFromCol = HTMLitem
                Exit For
            End If
        End If
    Next HTMLitem
End Function
Function getPrice(webSite As String, divClass As String, ulClass As String, liClass As String, spanClass As String) As String
    Dim XMLPage As New msxml2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLitem As MSHTML.IHTMLElement
    Dim HTMLObjCol As MSHTML.IHTMLElementCollection

    XMLPage.Open "GET", webSite, False
    XMLPage.send
    HTMLDoc.body.innerHTML = XMLPage.responseText

    Set HTMLObjCol = HTMLDoc.getElementsByClassName(divClass)
    Set HTMLitem = getHTMLelemFromCol(HTMLObjCol, "DIV", divClass)          ' Find the div class we are interested in first
    Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "UL", ulClass)     ' Find the ul class we are interested in
    Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "LI", liClass)     ' Find the li class we are interested in
    Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "SPAN", spanClass) ' Find the span class we are interested in

    getPrice = HTMLitem.innerText
End Function
Sub Run()
    Dim webSite As String, divClass As String, ulClass As String, liClass As String, spanClass As String, regularPrice As String, reducedPrice As String

    webSite = "https://www.rakuten.de/produkt/msi-b450-tomahawk-max-atx-mainboard-4x-ddr4-max-64gb-1x-dvi-d-1x-hdmi-14-1x-usb-c-31-2843843890"
    divClass = "vw-productFeatures "

    ' Get the regular price
    ulClass = "feature-list -price-container"
    liClass = "feature -price"
    spanClass = "value"
    regularPrice = getPrice(webSite, divClass, ulClass, liClass, spanClass)

    ' Get the reduced price
    ulClass = "feature-list vw-productVoucher -hide"
    liClass = "voucher-information"
    spanClass = "voucher-reduced-price"
    reducedPrice = getPrice(webSite, divClass, ulClass, liClass, spanClass)

    Debug.Print "Regular price: " & regularPrice
    Debug.Print "Reduced price: " & reducedPrice
End Sub

The output I am getting:

Regular price: 126,37
Reduced price: 

Debugger shows that it is able to find the correct span class, but it does not have any attribute (including innerText) that has the price information.

How can I get the reduced price information?

Upvotes: 1

Views: 656

Answers (2)

Ryan Wildry
Ryan Wildry

Reputation: 5677

Sometimes when much of the page's content is dependent on API calls, it is easier to use browser automation.

It's non-ideal from a performance perspective, but faster to get operational, and works in a pinch. The alternative approach is to monitor the web traffic between you and the server, and see if you can emulate the web requests to get the reduced price. This would be faster, but may take a bit of time to figure out how this works.

There are trade-offs for each approach to consider. Below is some Internet Explorer Automation code that is working for me to retrieve the data I believe you are after.

Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

Sub GetReducedPrice()
    Dim text As String

    With CreateObject("internetexplorer.application")
        .navigate "https://www.rakuten.de/produkt/msi-b450-tomahawk-max-atx-mainboard-4x-ddr4-max-64gb-1x-dvi-d-1x-hdmi-14-1x-usb-c-31-2843843890"
         Do While .Busy And .readyState <> 4: DoEvents: Loop
         Sleep 1000 ' wait a little bit too
         text = .document.querySelector(".voucher-reduced-price").innerText
        .Quit
    End With

    Debug.Print "the reduced price is: " & text
End Sub

Result is:

the reduced price is: 101,10

Upvotes: 1

Sers
Sers

Reputation: 12255

There's no -hide class for reduce price:

ulClass = "feature-list vw-productVoucher"

You can use simple selectors to get both prices with querySelector (example) instead of complex methods with unnecessary iterations.

regularPrice = HTMLDoc.querySelector(".-price .value").innerText
reducedPrice = HTMLDoc.querySelector(".voucher-reduced-price").innerText

Update: Vaucher is https://tags.tiqcdn.com/utag/rakuten/main/prod/utag.js here and calculated based on product_shop_id and dates.

Upvotes: 0

Related Questions