Reputation: 137
I'm trying to scrape all the image links from this webpage using requests module. When I use this link I can only scrape the image links up until the rest of the content which show up while scrolling downward. However, If I use this link, I can get all the image ids by incrementing the last number attached to the very link. The problem is I can't reuse those ids to make them full-fledged image links.
I've tried with:
import requests
from bs4 import BeautifulSoup
url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/1'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
r = s.get(url)
for item in r.json()['results']:
print(item['img_id'])
How can I grab all the image links from the landing page of that website?
PS the first few sponsored image links should be ignored as they are not included in the api either.
Upvotes: 1
Views: 55
Reputation: 195438
Inspecting the page, the image URLs are constructed from the ID and first two tags obtained from the API:
import requests
url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/{}'
page = 1
while True:
data = requests.get(url.format(page)).json()
if not data['results']:
break
for r in data['results']:
print('https://stocksnap.io/photo/{}-{}-{}'.format(r['keywords'][0], r['keywords'][1], r['img_id']))
page += 1
Prints:
...
https://stocksnap.io/photo/iphone-cellphone-LNXYMM77SS
https://stocksnap.io/photo/business-technology-OGLUHZAPGF
https://stocksnap.io/photo/samsung-android-7ZALGLUAAW
https://stocksnap.io/photo/apple-macbook-55A6840521
https://stocksnap.io/photo/woman-talking-54C3E9FE9D
https://stocksnap.io/photo/samsung-galaxy-BB3307280A
https://stocksnap.io/photo/parc-bench-3D99A31C0C
https://stocksnap.io/photo/iphone-cellphone-E2C541A7DC
https://stocksnap.io/photo/iphone-mockup-167A645BDC
https://stocksnap.io/photo/mac-keyboard-BA9AFFE0BF
https://stocksnap.io/photo/sony-android-EB939B3311
https://stocksnap.io/photo/iphone-cellphone-B962ABCAC7
https://stocksnap.io/photo/building-man-D49A8BB4AE
https://stocksnap.io/photo/technology-computer-C9B37875B9
https://stocksnap.io/photo/iphone-cellphone-381F0FD1EE
https://stocksnap.io/photo/work-bag-96E1A8F1CB
https://stocksnap.io/photo/iphone-phone-70FE8C00C9
https://stocksnap.io/photo/iphone-mockup-9FCDF4E1F5
https://stocksnap.io/photo/young-girl-BE8BA006E6
https://stocksnap.io/photo/young-girl-7174B21D56
https://stocksnap.io/photo/man-woman-6XELVX8KAN
https://stocksnap.io/photo/nexus-smartphones-UAXILBRNUL
EDIT: To get .jpg
links, the same method applies:
import requests
url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/{}'
page = 1
while True:
data = requests.get(url.format(page)).json()
if not data['results']:
break
for r in data['results']:
print('https://cdn.stocksnap.io/img-thumbs/280h/{}-{}_{}.jpg'.format(r['keywords'][0], r['keywords'][1], r['img_id']))
page += 1
Prints:
...
https://cdn.stocksnap.io/img-thumbs/280h/iphone-cellphone_B962ABCAC7.jpg
https://cdn.stocksnap.io/img-thumbs/280h/building-man_D49A8BB4AE.jpg
https://cdn.stocksnap.io/img-thumbs/280h/technology-computer_C9B37875B9.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-cellphone_381F0FD1EE.jpg
https://cdn.stocksnap.io/img-thumbs/280h/work-bag_96E1A8F1CB.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-phone_70FE8C00C9.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-mockup_9FCDF4E1F5.jpg
https://cdn.stocksnap.io/img-thumbs/280h/young-girl_BE8BA006E6.jpg
https://cdn.stocksnap.io/img-thumbs/280h/young-girl_7174B21D56.jpg
https://cdn.stocksnap.io/img-thumbs/280h/man-woman_6XELVX8KAN.jpg
https://cdn.stocksnap.io/img-thumbs/280h/nexus-smartphones_UAXILBRNUL.jpg
Upvotes: 3