spaceandtime
spaceandtime

Reputation: 35

Parsing text with bs4 works with selenium but does not work with requests in Python

This code works and returns the single digit number that i want but its so slow and takes good 10 seconds to complete.I will be running this 4 times for my use so thats 40 seconds wasted every run. ` from selenium import webdriver from bs4 import BeautifulSoup

options = webdriver.FirefoxOptions()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)

driver.get('https://warframe.market/items/ivara_prime_blueprint')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

price_element = soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))

driver.close()`

This code on the other hand does not work. It returns None. ` import requests from bs4 import BeautifulSoup

url='https://warframe.market/items/ivara_prime_blueprint'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

price_element=soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))`

First thought was to add user agent but still did not work. When I print(soup) it gives me html code but when i parse it further it stops and starts giving me None even tho its the same command like in selenium example.

Upvotes: 1

Views: 118

Answers (1)

MendelG
MendelG

Reputation: 20038

The data is loaded dynamically within a <script> tag so Beautifulsoup doesn't see it (it doesn't render Javascript).

As an example, to get the data, you can use:

import json
import requests
from bs4 import BeautifulSoup


url = "https://warframe.market/items/ivara_prime_blueprint"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

script_tag = soup.select_one("#application-state")

json_data = json.loads(script_tag.string)
# Uncomment the line below to see all the data
# from pprint import pprint
# pprint(json_data)

for data in json_data["payload"]["orders"]:
    print(data["user"]["ingame_name"])

Prints:

Rogue_Monarch
Rappei
KentKoes
Tenno61189
spinifer14
Andyfr0nt
hollowberzinho

You can access the data as a dict and acess the keys/values.

I'd recommend an online tool to view all the JSON since it's quite large.

See also

Parsing out specific values from JSON object in BeautifulSoup

Upvotes: 2

Related Questions