Reputation: 55
I am currently learning from a Python tutorial from Udemy (total newbie to Python). I am currently at a Beautiful Soup section where we are busy with an exercise to scrape the price off the author's book on Amazon. My code is below:
import bs4, requests
url = 'https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
response.raise_for_status()
soup = bs4.BeautifulSoup(response.text, 'html.parser')
soup.select('#addToCart > a > h5 > div > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
When I inspect the path of the element of the price, I can see this:
<span class="a-size-medium a-color-price header-price">
$25.45
</span>
However when I copy and paste it by the soup.select and run the python command, I am only returned with a [] i.e. 2 square brackets. I should be getting the contents of the second code box.
UPDATE: During the period of which I was typing the question, it did display the result correctly, the contents of the box with $25.45, but 5 minutes later it went back to getting the result of the [] brackets only. I am behind a proxy, and have tried without going through a proxy, with no change in results. I dont get any error either when doing response.raise_for_status(). Please can some one assist?
(Remember that I don't intend to screen scrape any commercial site out there, I would very much like to apply my learnings to in-house scenarios)
Thank you!
Upvotes: 0
Views: 1533
Reputation: 474041
You are over-complicating your CSS selector and making it fragile - heavily dependent on the page layout. You don't have to go through the complete parent-child chain to locate an element. Choose the most reliable, readable and appropriate points you can base your locator on. For instance, in this case, the following works for me:
soup.select('#addToCart .header-price')
Upvotes: 1