Reputation: 37
Good Evening
Hi, everybody so i am trying to scrape images from this website
https://unsplash.com/t/wallpapers
yes, i know they do have an API, but i want to first use my coding skills then use an API.
now this is my code:
from requests_html import HTMLSession
session = HTMLSession()
url ="https://unsplash.com/t/wallpapers"
r = session.get(url)
r.html.render(sleep=3)
images = r.html.find("._2UpQX")
imglinks =[]
for image in images:
imglinks.append(image.attrs["src"])
imglinks
print(imglinks)
I am only able to get 6 links for the images :(
here is the image of the output and also the css of the said Website
Output: Output
Css of website: CSS of website
Upvotes: 1
Views: 271
Reputation: 157
I visited the website and noticed that it will render only the image present in the screen i.e, when you scroll, the above images will not be rendered any more and new ones get rendered. the no of images will also change based on the screen size.
I tried searching for how to send screen size so that we can send a bigger screen size, but I was unable to find any way to do so.
But I have one more idea, we can keep scrolling while scanning for images every time.
It works! I got 23 images running the below script (it is varying with every run actually, even I am not sure why)
from requests_html import HTMLSession
max_levels = 10
scroll_increment = 10
imglinks = set()
session = HTMLSession()
url = "https://unsplash.com/t/wallpapers"
scroll = 0
for level in range(max_levels):
print('level', level, 'scroll', scroll)
r = session.get(url)
r.html.render(scrolldown=scroll)
scroll += scroll_increment
images = r.html.find("._2UpQX")
print('new images found', len(images))
for image in images:
imglinks.add(image.attrs["src"])
print('unique images found till now', len(imglinks))
session.close()
print(imglinks)
print(len(imglinks))
I will leave it to you to explore the scroll length, no of scrolls required.
I didn't try How to Crawl Infinite Scrolling Pages using Python, but it may also help you
Upvotes: 1