Reputation: 17
Here is a website I'm trying to scrape: https://opyn.co/#/buy
I am trying to grap a div, but for some reason an empty list is returned. There has to be something I am doing wrong. Or perhaps a different approach is needed when dealing withs nested divs and websites running frameworks like react ?
Here is the code:
from bs4 import BeautifulSoup
import requests
url = 'https://opyn.co/#/buy'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "lxml")
print(soup)
all_p = soup.find_all('div')
print(f"{all_p} | Status code: {r.status_code}")
What are the options, how to get the content of nested divs that are rendered by react?
Upvotes: 0
Views: 69
Reputation: 195
As advised in comments, you can use Selenium to get dynamic content:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("user-agent=[Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36]")
chrome_driver = webdriver.Chrome(executable_path="chromedriver.exe", options=chrome_options)
chrome_driver.get("https://opyn.co/#/buy")
time.sleep(2) # wait to make sure that JS call pulled data that you need
divs = chrome_driver.find_elements_by_css_selector('div')
divs_content = []
for div in divs:
divs_content.append(div.text)
print(divs_content)
You will need to download an executable driver that suits you: https://pypi.org/project/selenium/ For example that i provided you will need chrome webdriver: https://sites.google.com/a/chromium.org/chromedriver/downloads Just check what version your PC is running, download same version of executable and place it in the same folder as your python application.
Upvotes: 1