Jaybay
Jaybay

Reputation: 23

Parsing Webpage with BeautifulSoup Doesn't Give Full Page Contents

I'm trying to parse the description 'Enjoy the power to creat and control...' from this webpage: https://www.origin.com/zaf/en-us/store/the-sims/the-sims-4.

When I parse the page with Beautifulsoup, the page source doesn't include the description and I'm not sure why.

handle = 'sims 4'

query = handle + " origin.com"  # enter query to search
print(query)
for topresult in search(query, tld="com", lang='en', num=10, stop=1, pause=2):  
    print('Query Successful:' + handle)

page = requests.get(topresult)
soup = BeautifulSoup(page, 'html.parser')

print(soup)

Any help would be appreciated. I've been trying to figure this out for a couple days. I've also tried using Selenium and the Chrome driver but get a similar result.

Upvotes: 2

Views: 799

Answers (1)

LuckyZakary
LuckyZakary

Reputation: 1191

Requests and BeautifulSoup will not work for this because the page is loaded dynamically with javascript. That is why you cannot find the description. Selenium webdriver should work just fine. I wrote some code to get the description.


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

driver.get('https://www.origin.com/zaf/en-us/store/the-sims/the-sims-4')
desc = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//p[@ng-bind-html="::$ctrl.description"]')))
print(desc.text)

Upvotes: 1

Related Questions