robots.txt
robots.txt

Reputation: 137

Can't use image ids in order to make them qualified image links

I'm trying to scrape all the image links from this webpage using requests module. When I use this link I can only scrape the image links up until the rest of the content which show up while scrolling downward. However, If I use this link, I can get all the image ids by incrementing the last number attached to the very link. The problem is I can't reuse those ids to make them full-fledged image links.

I've tried with:

import requests
from bs4 import BeautifulSoup

url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/1'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
    r = s.get(url)
    for item in r.json()['results']:
        print(item['img_id'])

How can I grab all the image links from the landing page of that website?

PS the first few sponsored image links should be ignored as they are not included in the api either.

Upvotes: 1

Views: 55

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

Inspecting the page, the image URLs are constructed from the ID and first two tags obtained from the API:

import requests


url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/{}'

page = 1
while True:
    data = requests.get(url.format(page)).json()

    if not data['results']:
        break

    for r in data['results']:
        print('https://stocksnap.io/photo/{}-{}-{}'.format(r['keywords'][0], r['keywords'][1], r['img_id']))

    page += 1

Prints:

...

https://stocksnap.io/photo/iphone-cellphone-LNXYMM77SS
https://stocksnap.io/photo/business-technology-OGLUHZAPGF
https://stocksnap.io/photo/samsung-android-7ZALGLUAAW
https://stocksnap.io/photo/apple-macbook-55A6840521
https://stocksnap.io/photo/woman-talking-54C3E9FE9D
https://stocksnap.io/photo/samsung-galaxy-BB3307280A
https://stocksnap.io/photo/parc-bench-3D99A31C0C
https://stocksnap.io/photo/iphone-cellphone-E2C541A7DC
https://stocksnap.io/photo/iphone-mockup-167A645BDC
https://stocksnap.io/photo/mac-keyboard-BA9AFFE0BF
https://stocksnap.io/photo/sony-android-EB939B3311
https://stocksnap.io/photo/iphone-cellphone-B962ABCAC7
https://stocksnap.io/photo/building-man-D49A8BB4AE
https://stocksnap.io/photo/technology-computer-C9B37875B9
https://stocksnap.io/photo/iphone-cellphone-381F0FD1EE
https://stocksnap.io/photo/work-bag-96E1A8F1CB
https://stocksnap.io/photo/iphone-phone-70FE8C00C9
https://stocksnap.io/photo/iphone-mockup-9FCDF4E1F5
https://stocksnap.io/photo/young-girl-BE8BA006E6
https://stocksnap.io/photo/young-girl-7174B21D56
https://stocksnap.io/photo/man-woman-6XELVX8KAN
https://stocksnap.io/photo/nexus-smartphones-UAXILBRNUL

EDIT: To get .jpg links, the same method applies:

import requests


url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/{}'

page = 1
while True:
    data = requests.get(url.format(page)).json()

    if not data['results']:
        break

    for r in data['results']:
        print('https://cdn.stocksnap.io/img-thumbs/280h/{}-{}_{}.jpg'.format(r['keywords'][0], r['keywords'][1], r['img_id']))

    page += 1

Prints:

...

https://cdn.stocksnap.io/img-thumbs/280h/iphone-cellphone_B962ABCAC7.jpg
https://cdn.stocksnap.io/img-thumbs/280h/building-man_D49A8BB4AE.jpg
https://cdn.stocksnap.io/img-thumbs/280h/technology-computer_C9B37875B9.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-cellphone_381F0FD1EE.jpg
https://cdn.stocksnap.io/img-thumbs/280h/work-bag_96E1A8F1CB.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-phone_70FE8C00C9.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-mockup_9FCDF4E1F5.jpg
https://cdn.stocksnap.io/img-thumbs/280h/young-girl_BE8BA006E6.jpg
https://cdn.stocksnap.io/img-thumbs/280h/young-girl_7174B21D56.jpg
https://cdn.stocksnap.io/img-thumbs/280h/man-woman_6XELVX8KAN.jpg
https://cdn.stocksnap.io/img-thumbs/280h/nexus-smartphones_UAXILBRNUL.jpg

Upvotes: 3

Related Questions