JuicyKevin
JuicyKevin

Reputation: 45

BS4 - 'NoneType' object has no attribute 'findAll' when scanning spans on amazon page

I'm following a Udemy course on learning BS4 and it seems to be a bit outdated so I'm having trouble with this part.

The objective is to scrape the price of this TV from this amazon page, and in the course the instructor also gets this error and says he fixes it by changing the class name he's searching for via findAll. I tried the same thing (meaning different class not the same one he used) and was met again with the attribute error. According to the answer for a similar issue, the class being searched for didn't contain what was being looked for, but I don't believe the same is happening to me.

The code: https://pastebin.com/SMQBXt31 `

from datetime import datetime
import requests
import csv
import bs4

USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15"
REQUEST_HEADER = {
    "User-Agent": USER_AGENT,
    "Accept-Language": "en-US, en;q=0.5"
}

def get_page_html(url):
    res = requests.get(url=url, headers=REQUEST_HEADER) #res = response
    return res.content

def get_product_price(soup):
    main_price_span = soup.find("span", attrs={
        "class": "a-price aok-align-center reinventPricePriceToPayPadding priceToPay"
    })
    price_spans = main_price_span.findAll("span")
    for span in price_spans:
        price = span.text.strip().replace("$", "").replace(",", "")
        print(price)

def extract_product_info(url):
    product_info = {}
    print(f"Scraping URL: {url}")
    html = get_page_html(url)
    soup = bs4.BeautifulSoup(html, "lxml")
    product_info["price"] = get_product_price(soup)

if __name__ == '__main__':
    with open("amazon_products_urls.csv", newline="") as csvfile:
        reader = csv.reader(csvfile, delimiter=",")
        for row in reader:
            url = row[0]
        print(extract_product_info(url))

`

The website:https://www.amazon.com/Hisense-Premium-65-Inch-Compatibility-65U8G/dp/B091XWTGXL/ref=sr_1_1_sspa?crid=3NYCKNFHL6DU2&keywords=hisense%2Bpremium%2B65%2Binch&qid=1651840513&sprefix=hisense%2Bpremium%2B65%2Binch%2B%2Caps%2C116&sr=8-1-spons&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyVzUyTjBMS1JCVFVRJmVuY3J5cHRlZElkPUEwNDY2ODc0MlozVlFMVFJKQ0s2VyZlbmNyeXB0ZWRBZElkPUEwODI5OTgxMTRZSjdMMzYyQjk4NyZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU&th=1

Upvotes: 0

Views: 44

Answers (1)

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

There are lot of spans from that you have to select only the price span class correctly which are located in [class="a-size-mini olpWrapper"]

price_spans = main_price_span.find_all("span",class_="a-size-mini olpWrapper")
for span in price_spans:
    price = span.text.strip().replace("$", "").replace(",", "")
    print(price)

#OR

price_spans =[x.get_text(strip=True).replace("$", "") for x in main_price_span.find("span",class_="a-size-mini olpWrapper")]

Upvotes: 1

Related Questions