Reputation: 1220
i try to scrape some information from a website - for most of the div-informations this works fine - but i have problem reading some specific DIVs. At first i only tried it with "normal" bs4-request - but then also with selenium - but i still get no data back...
Below you can find my full code. It works fine with a response with this search:
tmpDiv = soup.find ("div", {"id": "financial-strength"})
But it is not working with this div:
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
It outputs only
<div class="children" data-v-39722e0c="" id="analyst-estimate" style="min-
height:200px;display:block;">
</div>
Below you can find the full (not working) code
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import sys, os
from selenium.webdriver.chrome.options import Options
link = "https://www.gurufocus.com/stock/AAPL/summary"
path = os.path.abspath (os.path.dirname (sys.argv[0]))
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
cd = '/chromedriver.exe'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link)
soup = BeautifulSoup (driver.page_source, 'html.parser')
time.sleep (2)
page = requests.get (link)
soup = BeautifulSoup (page.content, 'html.parser')
# tmpDiv = soup.find ("div", {"id": "financial-strength"})
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
print(tmpDiv.prettify())
I heard this is probably a "lazy loading website" - but shouldn´t the selenium-access wait till the full site is loaded with all the content?
Upvotes: 0
Views: 82
Reputation: 25196
There are two major things, why you wont get the result:
After requesting website with selenium
you also requesting it with requests
and assign the response to soup.
Data wont be loaded, if not needed, that is what you already figured out --> "lazy loading website"
Remove all requests
specific lines
Scroll the element you need into view, so that data is loading:
element = driver.find_element_by_id("analyst-estimate")
driver.execute_script("arguments[0].scrollIntoView();", element)
Example
Be aware, I added my webdriver path, so you have to edit it.
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
link = "https://www.gurufocus.com/stock/AAPL/summary"
driver = webdriver.Chrome ('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get (link)
time.sleep(2.5)
element = driver.find_element_by_id("analyst-estimate")
driver.execute_script("arguments[0].scrollIntoView();", element)
time.sleep(1)
soup = BeautifulSoup (driver.page_source, 'html.parser')
# tmpDiv = soup.find ("div", {"id": "financial-strength"})
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
print(tmpDiv.prettify())
Output
<div class="children" data-v-39722e0c="" id="analyst-estimate" style="">
<div class="capture-area">
<h2 class="fs-large fc-primary fw-bolder">
Analyst Estimate
</h2>
<table class="normal-table-mobile financial-strength-table">
<tbody>
<tr>
<td>
</td>
<td>
Sep 2021
</td>
<td>
Sep 2022
</td>
<td>
Sep 2023
</td>
</tr>
<tr>
<td>
Revenue (Mil $)
</td>
<td>
<span>
313003.40
</span>
</td>
<td>
<span>
328872.10
</span>
</td>
<td>
<span>
341577.60
</span>
</td>
</tr>
<tr>
<td>
EBIT (Mil $)
</td>
<td>
<span>
76803.87
</span>
</td>
<td>
<span>
81038.89
</span>
</td>
<td>
<span>
84830.53
</span>
</td>
</tr>
<tr>
<td>
EBITDA (Mil $)
</td>
<td>
<span>
88706.60
</span>
</td>
<td>
<span>
92604.88
</span>
</td>
<td>
<span>
94034.53
</span>
</td>
</tr>
<tr>
<td>
EPS ($)
</td>
<td>
<span>
3.94
</span>
</td>
<td>
<span>
4.28
</span>
</td>
<td>
<span>
4.55
</span>
</td>
</tr>
<tr>
<td>
EPS without NRI ($)
</td>
<td>
<span>
3.97
</span>
</td>
<td>
<span>
4.27
</span>
</td>
<td>
<span>
4.55
</span>
</td>
</tr>
<tr>
<td>
EPS Growth Rate (%)
</td>
<td>
<span>
10.04
</span>
</td>
<td>
<!-- -->
</td>
<td>
<!-- -->
</td>
</tr>
<tr>
<td>
Dividends per Share ($)
</td>
<td>
<span>
0.74
</span>
</td>
<td>
<span>
0.82
</span>
</td>
<td>
<span>
1.15
</span>
</td>
</tr>
</tbody>
</table>
</div>
</div>
Upvotes: 1