Scraping (BeautifulSoap, Selenium) not possible for all DIVs?

Question

i try to scrape some information from a website - for most of the div-informations this works fine - but i have problem reading some specific DIVs. At first i only tried it with "normal" bs4-request - but then also with selenium - but i still get no data back...

Below you can find my full code. It works fine with a response with this search:

tmpDiv = soup.find ("div", {"id": "financial-strength"})

But it is not working with this div:

tmpDiv = soup.find ("div", {"id": "analyst-estimate"})

It outputs only

Below you can find the full (not working) code

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import sys, os
from selenium.webdriver.chrome.options import Options

link = "https://www.gurufocus.com/stock/AAPL/summary"
path = os.path.abspath (os.path.dirname (sys.argv[0]))
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
cd = '/chromedriver.exe'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link)
soup = BeautifulSoup (driver.page_source, 'html.parser')
time.sleep (2)

page = requests.get (link)
soup = BeautifulSoup (page.content, 'html.parser')
# tmpDiv = soup.find ("div", {"id": "financial-strength"})
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
print(tmpDiv.prettify())

I heard this is probably a "lazy loading website" - but shouldn´t the selenium-access wait till the full site is loaded with all the content?

HedgeHog · Accepted Answer

What happens?

There are two major things, why you wont get the result:

After requesting website with selenium you also requesting it with requests and assign the response to soup.
Data wont be loaded, if not needed, that is what you already figured out --> "lazy loading website"

How to fix that?

Remove all requests specific lines

Scroll the element you need into view, so that data is loading:

element = driver.find_element_by_id("analyst-estimate")
driver.execute_script("arguments[0].scrollIntoView();", element)

Example

Be aware, I added my webdriver path, so you have to edit it.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains

link = "https://www.gurufocus.com/stock/AAPL/summary"
driver = webdriver.Chrome ('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get (link)

time.sleep(2.5)

element = driver.find_element_by_id("analyst-estimate")
driver.execute_script("arguments[0].scrollIntoView();", element)

time.sleep(1)

soup = BeautifulSoup (driver.page_source, 'html.parser')

# tmpDiv = soup.find ("div", {"id": "financial-strength"})
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
print(tmpDiv.prettify())

Output


 
  
   Analyst Estimate
  
  
   
    
     
     
     
      Sep 2021
     
     
      Sep 2022
     
     
      Sep 2023
     
    
    
     
      Revenue (Mil $)
     
     
      
       313003.40
      
     
     
      
       328872.10
      
     
     
      
       341577.60
      
     
    
    
     
      EBIT (Mil $)
     
     
      
       76803.87
      
     
     
      
       81038.89
      
     
     
      
       84830.53
      
     
    
    
     
      EBITDA (Mil $)
     
     
      
       88706.60
      
     
     
      
       92604.88
      
     
     
      
       94034.53
      
     
    
    
     
      EPS ($)
     
     
      
       3.94
      
     
     
      
       4.28
      
     
     
      
       4.55
      
     
    
    
     
      EPS without NRI ($)
     
     
      
       3.97
      
     
     
      
       4.27
      
     
     
      
       4.55
      
     
    
    
     
      EPS Growth Rate (%)
     
     
      
       10.04
      
     
     
      
     
     
      
     
    
    
     
      Dividends per Share ($)
     
     
      
       0.74
      
     
     
      
       0.82
      
     
     
      
       1.15

Scraping (BeautifulSoap, Selenium) not possible for all DIVs?

Answers (1)

What happens?

How to fix that?

Related Questions