How to scrape the new format for Product information on Amazon.com using BeautifulSoup?

Question

In this post, a solution on how to scrape Amazon.com Product information/Product details table is given by alecxe. However, the format of that description table is different than many of the newer items listed on amazon.

The old format, which you can see here, is different than the new format here.

What I tried: In the code given by alecxe he uses

for li in soup.select('table#productDetailsTable div.content ul li'):

I tried changing this to (and removed everything after it):

for tr in soup.select('table#productDetails_detailBullets_sections1 tbody tr'):
    print text.tr
    print(repr(tr))

to see if I would be able to extract at least something from the product information table. However, nothing printed.

I also tried the find_all() and find() functions but I was unable to extract what I needed or even close to what I needed.

My issue with figuring this out is caused by the structure of the HTML for the new tables. It look something like:


.
.
.    

.
. 
.


    
        Best Sellers Rank
    
    
         

                #8,740 in Toys & Games (See Top 100 in Toys & Games)
        


                #67 in Toys & Games > Puzzles > Jigsaw Puzzles
        


                #87 in Toys & Games > Preschool > Pre-Kindergarten Toys > Puzzles

If I want to just extract the seller rank for "Toys & Games > Puzzles > Jigsaw Puzzles" How am I supposed to do that? (The text in the second , at least in this case, in the HTML above)

t.m.adam · Accepted Answer

I could make your code work with some small adjustments :

Remove 'tbody' in soup.select , it's a tag generated by the browser
Print tr.text not text.tr

Code :

for tr in soup.select('table#productDetails_detailBullets_sections1 tr'):
    if 'Jigsaw Puzzles' in tr.text :
        print(tr.text.strip())

Or if you prefer find / find_all :

for tr in soup.find('table', id='productDetails_detailBullets_sections1').find_all('tr') :
    if 'Jigsaw Puzzles' in tr.text : 
        for span in tr.find('span').find_all('span') : 
            if 'Jigsaw Puzzles' in span.text : 
                print(span.text.strip())

How to scrape the new format for Product information on Amazon.com using BeautifulSoup?

Answers (1)

Related Questions