Reputation: 370
I have a nice URL structure to loop through:
https://marco.ccr.buffalo.edu/images?page=0&score=Clear
https://marco.ccr.buffalo.edu/images?page=1&score=Clear
https://marco.ccr.buffalo.edu/images?page=2&score=Clear
...
I want to loop through each of these pages and download the 21 images (JPEG or PNG). I've seen several Beautiful Soap examples, but Im still struggling to get something that will download multiple images and loop through the URLs. I think I can use urllib to loop through each URL like this, but Im not sure where the image saving comes in. Any help would be appreciated and thanks in advance!
for i in range(0,10):
urllib.urlretrieve('https://marco.ccr.buffalo.edu/images?page=' + str(i) + '&score=Clear')
I was trying to follow this post but I was unsuccessful: How to extract and download all images from a website using beautifulSoup?
Upvotes: 0
Views: 2122
Reputation: 71471
You can use requests
:
from bs4 import BeautifulSoup as soup
import requests, contextlib, re, os
@contextlib.contextmanager
def get_images(url:str):
d = soup(requests.get(url).text, 'html.parser')
yield [[i.find('img')['src'], re.findall('(?<=\.)\w+$', i.find('img')['alt'])[0]] for i in d.find_all('a') if re.findall('/image/\d+', i['href'])]
n = 3 #end value
os.system('mkdir MARCO_images') #added for automation purposes, folder can be named anything, as long as the proper name is used when saving below
for i in range(n):
with get_images(f'https://marco.ccr.buffalo.edu/images?page={i}&score=Clear') as links:
print(links)
for c, [link, ext] in enumerate(links, 1):
with open(f'MARCO_images/MARCO_img_{i}{c}.{ext}', 'wb') as f:
f.write(requests.get(f'https://marco.ccr.buffalo.edu{link}').content)
Now, inspecting the contents of the MARCO_images
directory yields:
print(os.listdir('/Users/ajax/MARCO_images'))
Output:
['MARCO_img_1.jpg', 'MARCO_img_10.jpg', 'MARCO_img_11.jpg', 'MARCO_img_12.jpg', 'MARCO_img_13.jpg', 'MARCO_img_14.jpg', 'MARCO_img_15.jpg', 'MARCO_img_16.jpg', 'MARCO_img_17.jpg', 'MARCO_img_18.jpg', 'MARCO_img_19.jpg', 'MARCO_img_2.jpg', 'MARCO_img_20.jpg', 'MARCO_img_21.jpg', 'MARCO_img_3.jpg', 'MARCO_img_4.jpg', 'MARCO_img_5.jpg', 'MARCO_img_6.jpg', 'MARCO_img_7.jpg', 'MARCO_img_8.jpg', 'MARCO_img_9.jpg']
Upvotes: 3