Sharid
Sharid

Reputation: 161

How to get elements that are out of Parent Class

I am trying to extract some data from the web. However NOT all of the information that I need is in the Parent Class. I can get the information in the Parent class.

QUESTION - Is there a way to get data if it is outside of the parent class? or is there a way to set the below code to extract without using a parent class.

Link

I am using IE as it allos me to search the site. I have tried several code variations however, the extra information is not is the parent class that I am trying to extract from.

I am after the name, location and social media links. Location is at the tops of the webpage out of the class

other info

I tried to use the following for parent class shop-home as all other class fall into it, but it did not work. I have never tried to get data that is not in the parent class so, not 100% sure how to do it. SIM helped with this element.ParentNode.ParentNode.getElementsByClassName as the product url was before the parent. I have been trying to use this for all the other data that is outside the parent, however I can not get it to work. I do not full understand it if someone could explain what the .ParentNode.ParentNode. is doing that will help with my understand and I might be able to work the rest out myself.

The code below is for the first two items that pulls off fine, the code layout is the same for all items except it is as If element.getElementsByClassName("CLASS HERE")(0) . I have tried using ID Tag Span AND SO ON If element.getElementsByClassName("CLASS HERE")(0).getelementsByTagName ("Span") (0)

        Application.ScreenUpdating = False
        Set HTML = objIE.document

''''########## Setting the Parent Class HERE ##########
       Set elements = HTML.getElementsByClassName("v2-listing-card__info") 
         
    ''''Scrolls Down the Browser 
   objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
       
    ''''FOR LOOP
        For Each element In elements
''' Element 1
        If element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0) Is Nothing Then 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-" 
        Else
            HtmlText = element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText 
        End If
''' Element 2
        If element.getElementsByTagName("h3")(0) Is Nothing Then 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-" 
        Else
            HtmlText = element.getElementsByTagName("h3")(0).innerText ' Get CLASS and Child Nod 'src
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
        End If
''' Element 3

RESULTS - Date in red is wrong or missing as it is not in the above parent class

Wrong data

The shipping in column H pulls off fine as it is in the Parent, If there is no shipping info then a hyphen goes into the cell. Items for C,D,E, are out of the parent class that I am using.

<div class="flex-grow-1">
  <div class="max-width-760px ">


  </div>

  <div class="max-width-676px">
    <div class="">
      <p class="wt-text-heading-02 wt-display-inline" data-inplace-editable-text="story_headline" data-endpoint="AboutPost" data-key="story_headline" data-placeholder="Sum up what you do in one sentence. Or just write something catchy." data-use-inplace-input="1"
        data-add-class="normal story-headline-edit-link"></p>
    </div>
    <div class="">
      <div id="about-story" class="" aria-hidden="false">
        <p class="about-story text-body-larger text-gray-lighter ">
          <span class="mt-xs-1" data-inplace-editable-text="story" data-endpoint="AboutPost" data-key="story" data-placeholder="How did you get started? What inspires you? We know each seller’s story is unique — tell yours here."></span>
        </p>

      </div>
      <div class="wt-text-center-xs">

      </div>
    </div>
  </div>

  <div class="wt-mb-xs-6 wt-mb-md-8">
    <div class="clearfix"></div>

    <div>
      <h3 class="wt-text-title-01"></h3>
      <div class="pt-xs-2 pt-lg-4">
        <div class="display-flex-md flex-wrap max-width-760px">
          <div class="mb-xs-2 text-body mr-md-6">
            <a href="https://www.facebook.com/Lucky-Plum-706715642737271/" class="text-decoration-none clearfix" title="Facebook" target="_blank" rel="nofollow noopener">
              <span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M20,5V19a1.007,1.007,0,0,1-1,1H15V13.776h2l0.336-2.3H15V9.659a0.912,0.912,0,0,1,1-1.031h1.5V6.55a11.284,11.284,0,0,0-1.641-.109c-2.2,0-3.3,1.219-3.3,3.039v1.992h-2v2.3h2V20H5a1.007,1.007,0,0,1-1-1V5A1.007,1.007,0,0,1,5,4H19A1.007,1.007,0,0,1,20,5Z"></path></svg></span>
              <span>Facebook</span>
            </a>
          </div>
          <div class="mb-xs-2 text-body mr-md-6">
            <a href="https://www.instagram.com/luckyplumstudio/" class="text-decoration-none clearfix" title="Instagram" target="_blank" rel="nofollow noopener">
              <span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,5.447c2.136,0,2.389,0.008,3.233,0.047c0.78,0.036,1.204,0.166,1.485,0.275c0.373,0.145,0.64,0.318,0.92,0.598 c0.28,0.28,0.453,0.546,0.598,0.92c0.11,0.282,0.24,0.706,0.275,1.485c0.038,0.844,0.047,1.097,0.047,3.233 s-0.008,2.389-0.047,3.233c-0.036,0.78-0.166,1.204-0.275,1.485c-0.145,0.373-0.318,0.64-0.598,0.92 c-0.28,0.28-0.546,0.453-0.92,0.598c-0.282,0.11-0.706,0.24-1.485,0.275c-0.843,0.038-1.096,0.047-3.233,0.047 s-2.389-0.008-3.233-0.047c-0.78-0.036-1.204-0.166-1.485-0.275c-0.373-0.145-0.64-0.318-0.92-0.598 c-0.28-0.28-0.453-0.546-0.598-0.92c-0.11-0.282-0.24-0.706-0.275-1.485c-0.038-0.844-0.047-1.097-0.047-3.233 S5.45,9.616,5.488,8.773c0.036-0.78,0.166-1.204,0.275-1.485c0.145-0.373,0.318-0.64,0.598-0.92c0.28-0.28,0.546-0.453,0.92-0.598 c0.282-0.11,0.706-0.24,1.485-0.275C9.611,5.455,9.864,5.447,12,5.447 M12,4.005c-2.173,0-2.445,0.009-3.298,0.048 C7.85,4.092,7.269,4.227,6.76,4.425C6.234,4.63,5.787,4.903,5.343,5.348C4.898,5.793,4.624,6.239,4.42,6.765 c-0.198,0.509-0.333,1.09-0.372,1.942C4.009,9.56,4,9.833,4,12.005c0,2.173,0.009,2.445,0.048,3.298 c0.039,0.852,0.174,1.433,0.372,1.942c0.204,0.526,0.478,0.972,0.923,1.417c0.445,0.445,0.891,0.718,1.417,0.923 c0.509,0.198,1.09,0.333,1.942,0.372c0.853,0.039,1.126,0.048,3.298,0.048s2.445-0.009,3.298-0.048 c0.852-0.039,1.433-0.174,1.942-0.372c0.526-0.204,0.972-0.478,1.417-0.923c0.445-0.445,0.718-0.891,0.923-1.417 c0.198-0.509,0.333-1.09,0.372-1.942C19.991,14.45,20,14.178,20,12.005s-0.009-2.445-0.048-3.298 c-0.039-0.852-0.174-1.433-0.372-1.942c-0.204-0.526-0.478-0.972-0.923-1.417c-0.445-0.445-0.891-0.718-1.417-0.923 c-0.509-0.198-1.09-0.333-1.942-0.372C14.445,4.014,14.173,4.005,12,4.005L12,4.005z"></path><path d="M12,7.897c-2.269,0-4.108,1.839-4.108,4.108S9.731,16.113,12,16.113s4.108-1.839,4.108-4.108S14.269,7.897,12,7.897z  M12,14.672c-1.473,0-2.667-1.194-2.667-2.667S10.527,9.339,12,9.339s2.667,1.194,2.667,2.667S13.473,14.672,12,14.672z"></path><circle cx="16.27" cy="7.735" r="0.96"></circle></svg></span>
              <span>Instagram</span>
            </a>
          </div>
        </div>
      </div>
    </div>
  </div>

  <div class="wt-mb-xs-8 wt-mb-md-10">
    <div class="clearfix"></div>

    <div class="about-section display-flex-md flex-direction-column-md  mb-md-5 pl-xs-0 pr-xs-0" data-region="shop-members" id="shop-members">
      <div class="p-xs-0">
        <h3 class="wt-text-title-01">Shop members</h3>
      </div>
      <div class="pl-xs-0 pr-xs-0  pt-xs-2 pt-lg-4">
        <div class="max-width-760px">
          <ul class="list-unstyled block-grid-md-2" data-region="shop-member-list">
            <li class="pt-xs-2 pb-xs-2 block-grid-item" data-region="shop-member" data-member-id="22676501471" data-member-avatar-url="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" data-member-bio="" data-member-role="Owner"
              data-member-name="Lucky Plum Studio">
              <div class="flag">
                <div class="flag-img vertical-align-top pr-lg-3">
                  <img src="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" alt="" class="circle" data-region="member-avatar" width="48" height="48">
                </div>
                <div class="flag-body">
                  <h6 class="mb-xs-0 b text-transform-none text-body" data-region="member-name">Lucky Plum Studio</h6>
                  <p class="prose" data-region="member-role">Owner</p>
                  <p class="text-gray-lighter mb-xs-0" data-region="member-bio">

                  </p>
                </div>
              </div>
            </li>
          </ul>
        </div>
      </div>
    </div>
  </div>

  <div class="">

  </div>
</div>

As Always thanks in advance

''######### updated today 22/3/2021 at 6pm uk time #########

In reply to Qharr answer. I had this for location and nothing was collected, could you please explain where i went wrong and I should be able to fix the rest

''' Element 4
DoEvents
          If element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0) Is Nothing Then ' Get CLASS and Child Nod
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = "-" 
        Else
            HtmlText = element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0).innerText 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = HtmlText 
        End If

Upvotes: 0

Views: 147

Answers (1)

QHarr
QHarr

Reputation: 84465

Not sure what to say except read up on html and html document methods/ css selectors so you understand the patterns you need to apply. The rest is just practice and learning which are the fastest and more robust methods.


CSS:

  1. Location: .shop-location span is a span child element with parent having class shop-location

  2. Social media links: #about .text-decoration-none child nodes with one class name that is text-decoration-none, having parent with id about.

  3. Name: [data-region='member-name'] element with data-region attribute having value member-name

Read about css selectors and descendant combinator here

Practice css selectors here

Learn about html here


VBA:

Option Explicit
Public Sub GetInfo()
    Dim ie As SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer

    With ie

        .Visible = True
        .Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

        With .document
        
            Debug.Print .querySelector(".shop-location span").innerText 'location
            
            Dim i As Long, socialMedias As Object
            
            Set socialMedias = .querySelectorAll("#about .text-decoration-none")
  
            For i = 0 To socialMedias.Length - 1 'media links
                Debug.Print socialMedias.Item(i).href
            Next
            
            Debug.Print .querySelector("[data-region='member-name']").innerText 'company name
            
        End With
        .Quit
    End With

End Sub

Less optimal methods for selecting:

Option Explicit

Public Sub GetInfo()
    Dim ie As SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer

    With ie

        .Visible = True
        .Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

        With .document
        
            Debug.Print .getElementsByClassName("shop-location wt-display-flex-xs")(0).getElementsByTagName("span")(0).innerText 'location
            
            Dim i As Object, socialMedias As Object
            
            Set socialMedias = .getElementById("about").getElementsByClassName("text-decoration-none clearfix")
  
            For Each i In socialMedias           'media links
                Debug.Print i.href
            Next
            
            Debug.Print .getElementById("about").getElementsByClassName("flag")(0).getElementsByTagName("h6")(0).innerText 'company name
            
        End With
        .Quit
    End With

End Sub

Upvotes: 1

Related Questions