Neil S
Neil S

Reputation: 229

Python BeautifulSoup html.parser not working

I have a script to pull off book information from Amazon which was running successfully before but failed today. I am not able to figure out exactly what is going wrong but I am assuming its the parser or Javascript related. I am using the below code.

from bs4 import BeautifulSoup
import requests

response = requests.get('https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=9780307397980',headers={'User-Agent': b'Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'})
html = response.content
soup = BeautifulSoup(html, "html.parser")
resultcol = soup.find('div', attrs={'id':'resultsCol'})

Previously I used to get data in resultcol but now its blank. When I check html I see the tag i am looking for i.e. <div id="resultsCol" class=\'\' >. But soup does not have this text in it. Can anyone help me debug this? It was working perfectly fine before but now it is not.

Upvotes: 2

Views: 6910

Answers (2)

Vishvajit Pathak
Vishvajit Pathak

Reputation: 3731

You need to wait until the page is completely loaded. You have to use phantomJs to make sure page is loaded correctly.

I was able to get the correct element with following code.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = ("https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3D"
       "stripbooks&field-keywords=9780307397980")

browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
resultcol = soup.find('img', attrs={'class': 's-access-image'})
print resultcol

Upvotes: 1

Guar
Guar

Reputation: 13

Remove headers, and it should work.

from bs4 import BeautifulSoup
import requests
response = requests.get('https://www.amazon.com/s/ref=nb_sb_noss?url=search-    alias%3Dstripbooks&field-keywords=9780307397980')
html = response.content
soup = BeautifulSoup(html, "html.parser")
resultcol = soup.find('div', attrs={'id':'resultsCol'})`

Upvotes: 1

Related Questions