Reputation: 45
I'm following a Udemy course on learning BS4 and it seems to be a bit outdated so I'm having trouble with this part.
The objective is to scrape the price of this TV from this amazon page, and in the course the instructor also gets this error and says he fixes it by changing the class name he's searching for via findAll. I tried the same thing (meaning different class not the same one he used) and was met again with the attribute error. According to the answer for a similar issue, the class being searched for didn't contain what was being looked for, but I don't believe the same is happening to me.
The code: https://pastebin.com/SMQBXt31 `
from datetime import datetime
import requests
import csv
import bs4
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15"
REQUEST_HEADER = {
"User-Agent": USER_AGENT,
"Accept-Language": "en-US, en;q=0.5"
}
def get_page_html(url):
res = requests.get(url=url, headers=REQUEST_HEADER) #res = response
return res.content
def get_product_price(soup):
main_price_span = soup.find("span", attrs={
"class": "a-price aok-align-center reinventPricePriceToPayPadding priceToPay"
})
price_spans = main_price_span.findAll("span")
for span in price_spans:
price = span.text.strip().replace("$", "").replace(",", "")
print(price)
def extract_product_info(url):
product_info = {}
print(f"Scraping URL: {url}")
html = get_page_html(url)
soup = bs4.BeautifulSoup(html, "lxml")
product_info["price"] = get_product_price(soup)
if __name__ == '__main__':
with open("amazon_products_urls.csv", newline="") as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for row in reader:
url = row[0]
print(extract_product_info(url))
`
Upvotes: 0
Views: 44
Reputation: 16187
There are lot of spans
from that you have to select only the price span class correctly which are located in [class="a-size-mini olpWrapper"]
price_spans = main_price_span.find_all("span",class_="a-size-mini olpWrapper")
for span in price_spans:
price = span.text.strip().replace("$", "").replace(",", "")
print(price)
#OR
price_spans =[x.get_text(strip=True).replace("$", "") for x in main_price_span.find("span",class_="a-size-mini olpWrapper")]
Upvotes: 1