Reputation: 53
I am scraping this website https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173 i want to get the 'Arrow' text on the right side of the 'choose your ship' text
I have tried using requests and BeautifulSoup to select the tag that contains the text, when i inspect the page i can see where the text is it's between the tag i try selecting it with soup.select(".name") i still get empty string, might be the data is being rendered with Javascript so i tried selenium and try to wait for the element to load before selecting it, still nothing here's my code
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "name"))
)
select_tags = driver.find_elements_by_css_selector(".name")
for tag in select_tags:
print(tag.text)
finally:
driver.quit()
Arrow
Upvotes: 1
Views: 1970
Reputation: 4030
Selenium might be overkill for a task like this where you don't need to interact with the page. This is just a few lines with requests_html
:
from requests_html import HTMLSession
url = 'https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173'
session = HTMLSession()
r = session.get(url)
r.html.render()
print(r.html.find('.info > .name', first=True).text)
which produces Arrow
as expected.
For this particular site you may also check elsewhere in the content to get the information you want without JavaScript support required, for example:
import json
import requests
url = 'https://robertsspaceindustries.com/pledge/ship-upgrades?to-ship=173'
r = requests.get(url)
text = r.text
json_start_text = 'fromShips: '
json_start = text.index(json_start_text) + len(json_start_text)
json_end = text.index(']', json_start)
json_text = text[json_start:json_end + 1]
data = json.loads(json_text)
for ship in data:
name = ship['name']
msrp = ship['msrp']
print(f'{name} {msrp}')
which results in
Aurora ES $20.00
P52 Merlin $20.00
Aurora MR $25.00
P72 Archimedes $30.00
Mustang Alpha $30.00
Aurora LX $30.00
...
Arrow $75.00
...
Upvotes: 2