Reputation: 187
I am struggling to get an image from this webpage, I am able to get the Title, price and other elements fine, just not the image.
<div class="product-img">
<a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
<div class="image" style="min-height: 350px;">
<img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded"
src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-
9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">
The code I am currently using is:
for ele in array:
item = [ele.find('h4', {'class': 'title'}).text, #title
ele.find('span', {'data-test-selector': 'spanPrice'}).text,
ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})['src']]
But that returns:
TypeError: 'NoneType' object is not subscriptable
Anyone have any idea?
Upvotes: 1
Views: 109
Reputation: 41
Using this
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen('https://www.scottycameron.com/store/product/3494')
bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img')
for img in images:
if img.has_attr('src'):
print(img['src'])
Output
/img/icon-header-user.png
/img/icon-header-cart.png
https://www.scottycameron.com/media/18299/puttertarchivenav_jan2021.jpg
https://www.scottycameron.com/media/18302/customizenav_jan2021.jpg
https://www.scottycameron.com/media/18503/showcasenav_2_2021.jpg
https://www.scottycameron.com/media/18454/2021phtmx_new_nws_thmb1.jpg
https://www.scottycameron.com/media/18301/aboutnav_jan2021_b.jpg
https://api.scottycameron.com/Data/Media/Catalog/1/1000/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE PLATE FRAME - SCOTTY CAMERON FINE MILLED PUTTERS.jpg
/store/content/images/loading.svg
all image in site url will be collected , from that we can do further required process.
Upvotes: 0
Reputation: 20022
You might want to check if there's an image
tag in the first place and then reach for the attribute:
from bs4 import BeautifulSoup
element = """
<div class="product-img">
<a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
<div class="image" style="min-height: 350px;">
<img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded"
src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">
</div>
</div>"""
image = BeautifulSoup(element, "html.parser").find("img", class_="img-responsive b-lazy b-loaded")
if image is not None:
print(image["src"])
Output:
https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg
EDIT:
As per your comment, try this:
item = []
for ele in array:
title = ele.find('h4', {'class': 'title'}).tex
price = ele.find('span', {'data-test-selector': 'spanPrice'}).text
img_src = ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})
if img_src is not None:
item.extend([title, price, img_src["src"]])
else:
item.append([title, price, "No image source"])
Upvotes: 1