Lukemul69
Lukemul69

Reputation: 187

How to use beautifulSoup to get the image from a webpage

I am struggling to get an image from this webpage, I am able to get the Title, price and other elements fine, just not the image.

<div class="product-img">
   <a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
      <div class="image" style="min-height: 350px;">
         <img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded" 
         src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244- 
         9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">

The code I am currently using is:

for ele in array:
            item = [ele.find('h4', {'class': 'title'}).text, #title
                    ele.find('span', {'data-test-selector': 'spanPrice'}).text,
                    ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})['src']]

But that returns:

TypeError: 'NoneType' object is not subscriptable

Anyone have any idea?

Upvotes: 1

Views: 109

Answers (2)

Using this

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen('https://www.scottycameron.com/store/product/3494')
bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img')
for img in images:
    if img.has_attr('src'):
        print(img['src'])

Output

/img/icon-header-user.png
/img/icon-header-cart.png
https://www.scottycameron.com/media/18299/puttertarchivenav_jan2021.jpg
https://www.scottycameron.com/media/18302/customizenav_jan2021.jpg
https://www.scottycameron.com/media/18503/showcasenav_2_2021.jpg
https://www.scottycameron.com/media/18454/2021phtmx_new_nws_thmb1.jpg
https://www.scottycameron.com/media/18301/aboutnav_jan2021_b.jpg
https://api.scottycameron.com/Data/Media/Catalog/1/1000/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE PLATE FRAME - SCOTTY CAMERON FINE MILLED PUTTERS.jpg
/store/content/images/loading.svg

all image in site url will be collected , from that we can do further required process.

Upvotes: 0

baduker
baduker

Reputation: 20022

You might want to check if there's an image tag in the first place and then reach for the attribute:

from bs4 import BeautifulSoup

element = """
<div class="product-img">
   <a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
      <div class="image" style="min-height: 350px;">
         <img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded" 
         src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">
       </div>
</div>"""

image = BeautifulSoup(element, "html.parser").find("img", class_="img-responsive b-lazy b-loaded")
if image is not None:
    print(image["src"])

Output:

https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg

EDIT:

As per your comment, try this:

item = []
for ele in array:
    title = ele.find('h4', {'class': 'title'}).tex
    price = ele.find('span', {'data-test-selector': 'spanPrice'}).text
    img_src = ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})
    if img_src is not None:
        item.extend([title, price, img_src["src"]])
    else:
        item.append([title, price, "No image source"])

Upvotes: 1

Related Questions