Reputation: 145
I use the following code to scrape image url from a website. What I try to achieve next is to save urls that end with '.jpg' into a local folder which can be the location of the py code. I manage to scrape and access urls and to create a folder in that location but I don't know how to save them. This is my code, any ideas are highly appreciated
from selenium import webdriver
import requests
import os
from bs4 import BeautifulSoup
import urllib
import urllib.request
from urllib.request import urlretrieve
import sys
if sys.version_info[0] >= 3:
from urllib.request import urlretrieve
else:
# if Not Python 3
from urllib import urlretrieve
site = 'https://www.amazon.de/dp/B077S8N26F'
directory = os.path.dirname(os.path.realpath(__file__)) + '/image_folder/'
if not os.path.exists(directory):
os.makedirs(directory)
driver = webdriver.Chrome()
driver.get(site)
soup = BeautifulSoup(driver.page_source, 'html.parser')
img_tags = soup.find_all('img')
urls = [img['src'] for img in img_tags]
for url in urls:
print(url)
#only the links that end with .jpg
images = [im for im in urls if im.endswith(".jpg")]
print(images)
for im in images:
#here is the missing part that saves urls into the folder created
Upvotes: 1
Views: 425
Reputation: 12195
For each of your entries in images
, that'll be the URI/URL of an image file.
To get the image, you need to make a separate HTTP request to get it. You can do this with python-requests
This existing answer should help you along the way without repeating it: How to download image using requests
Upvotes: 1