Joell
Joell

Reputation: 75

Parsing product title on Amazon using Requests

Source Code pulled from Amazon Best Seller List:

<div class="p13n-sc-truncate p13n-sc-line-clamp-2" aria-hidden="true" data-rows="2">
        Fingerlings Light Up Unicorn - Mackenzie (White) - Friendly Interactive Toy by WowWee
    </div>

........ omitted code here..........

    <div class="p13n-sc-truncate p13n-sc-line-clamp-2" aria-hidden="true" data-rows="2">
        Rocketbook Everlast Reusable Smart Notebook, Executive Size
    </div>

I want to go to Amazon.com and pull all the best selling items that are on the page. The code above is the source code pulled from the current page (the page updates hourly so the item names are different but the class is the same). So in this case I want it to pull the names:

"Rocketbook Everlast Reusable Smart Notebook, Executive Size" as well as "Fingerlings Light Up Unicorn - Mackenzie (White) - Friendly Interactive Toy by WowWee".

I was planning on doing it like this:

r = requests.get("https://www.amazon.com/Best-Sellers-Amazon-Launchpad/zgbs/boost/ref=zg_bs_nav_0")
soup = BeautifulSoup(get_cart.text,"lxml")
n = soup.find("div",{'class':'p13n-sc-truncated'})

I think that this approach will not work because one the class is found all over the page source and most likely this will produce an error as there are 2 many mentions of the class and, two, how will the text from that class even come? Will it just say the product name and nothing else?

Upvotes: 1

Views: 225

Answers (1)

alecxe
alecxe

Reputation: 473863

Right, this class is a bit too generic for this page. What you could do is first identify the containers where the best seller items are located. For instance, it could be:

soup.select("ol#zg-ordered-list > li")

Now, you could operate inside the item containers only which seriously reduces the scope:

for product in soup.select("ol#zg-ordered-list > li"):
    product_name = product.select_one(".p13n-sc-truncated").get_text()
    print(product_name)

Or, you could grab the product title from the alt attribute of the product image:

for product in soup.select("ol#zg-ordered-list > li"):
    product_name = product.select_one("img[alt]")["alt"]
    print(product_name)

Upvotes: 2

Related Questions