fazal
fazal

Reputation: 25

Scrape html only after data loads with delay using Python Requests?

I am trying to learn data scraping using python and have been using the Requests and BeautifulSoup4 libraries. It works well for normal html websites. But when I tried to get some data out of websites where the data loads after some delay, I found that I get an empty value. An example would be

from bs4 import BeautifulSoup
from operator import itemgetter
from selenium import webdriver
url = "https://www.example.com/;1"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('span', 'buy')
print(a)

I am trying to grab the from here: (value)

I have already referred a similar topic and tried executing my code on similar lines as the solution provided here. But somehow it doesnt seem to work. I am a novice here so need help getting this work. How to scrape html table only after data loads using Python Requests?

The table (content) is probably generated by JavaScript and thus can't be "seen". I am using python3.6 / PhantomJS / Selenium as proposed by a lot of answers here.

Upvotes: 2

Views: 5781

Answers (2)

Roman Mindlin
Roman Mindlin

Reputation: 842

You can access desired values by requesting it directly from API and analyze JSON response.

import requests
import json

res = request.get('https://api.example.com/api/')
d = json.loads(res.text)

print(d['market'])

Upvotes: 0

songxunzhao
songxunzhao

Reputation: 3169

You have to run headless browser to run delayed scraping. Please use selenium. Here is sample code. Code is using chrome browser as driver

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome(<chromedriver path here>)
browser.set_window_size(1120, 550)
browser.get(link)
element = WebDriverWait(browser, 3).until(
   EC.presence_of_element_located((By.ID, "blabla"))
)
data = element.get_attribute('data-blabla')
print(data)
browser.quit()

Upvotes: 3

Related Questions