Sherluck08
Sherluck08

Reputation: 53

How to scrape a website that render data with JavaScript

I am scraping this website https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173 i want to get the 'Arrow' text on the right side of the 'choose your ship' text

I have tried using requests and BeautifulSoup to select the tag that contains the text, when i inspect the page i can see where the text is it's between the tag i try selecting it with soup.select(".name") i still get empty string, might be the data is being rendered with Javascript so i tried selenium and try to wait for the element to load before selecting it, still nothing here's my code

try:
    element = WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CLASS_NAME, "name"))
    )

    select_tags = driver.find_elements_by_css_selector(".name")
    for tag in select_tags:
        print(tag.text)
finally:
    driver.quit()

Arrow

Upvotes: 1

Views: 1970

Answers (1)

Chris Hunt
Chris Hunt

Reputation: 4030

Selenium might be overkill for a task like this where you don't need to interact with the page. This is just a few lines with requests_html:

from requests_html import HTMLSession

url = 'https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173'

session = HTMLSession()
r = session.get(url)
r.html.render()
print(r.html.find('.info > .name', first=True).text)

which produces Arrow as expected.

For this particular site you may also check elsewhere in the content to get the information you want without JavaScript support required, for example:

import json

import requests

url = 'https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173'

r = requests.get(url)
text = r.text

json_start_text = 'fromShips: '
json_start = text.index(json_start_text) + len(json_start_text)
json_end = text.index(']', json_start)
json_text = text[json_start:json_end + 1]
data = json.loads(json_text)
for ship in data:
    name = ship['name']
    msrp = ship['msrp']
    print(f'{name} {msrp}')

which results in

Aurora ES $20.00
P52 Merlin $20.00
Aurora MR $25.00
P72 Archimedes $30.00
Mustang Alpha $30.00
Aurora LX $30.00
...
Arrow $75.00
...

Upvotes: 2

Related Questions