Why is requests_HTML only able to get 6 image links?

Good Evening

Hi, everybody so i am trying to scrape images from this website

yes, i know they do have an API, but i want to first use my coding skills then use an API.

now this is my code:

from requests_html import HTMLSession

session = HTMLSession()
url ="https://unsplash.com/t/wallpapers"

r = session.get(url)
r.html.render(sleep=3)


images = r.html.find("._2UpQX")

imglinks =[]

for image in images:
 imglinks.append(image.attrs["src"])
 
imglinks

print(imglinks)

I am only able to get 6 links for the images :(

here is the image of the output and also the css of the said Website

Output: Output

Css of website: CSS of website

Upvotes: 1

Answers (1)

Abhishek

Reputation: 157

I visited the website and noticed that it will render only the image present in the screen i.e, when you scroll, the above images will not be rendered any more and new ones get rendered. the no of images will also change based on the screen size.

I tried searching for how to send screen size so that we can send a bigger screen size, but I was unable to find any way to do so.

But I have one more idea, we can keep scrolling while scanning for images every time.

It works! I got 23 images running the below script (it is varying with every run actually, even I am not sure why)

from requests_html import HTMLSession

max_levels = 10
scroll_increment = 10
imglinks = set()

session = HTMLSession()
url = "https://unsplash.com/t/wallpapers"

scroll = 0

for level in range(max_levels):
    print('level', level, 'scroll', scroll)
    r = session.get(url)
    r.html.render(scrolldown=scroll)
    scroll += scroll_increment

    images = r.html.find("._2UpQX")
    print('new images found', len(images))

    for image in images:
        imglinks.add(image.attrs["src"])
    print('unique images found till now', len(imglinks))

session.close()

print(imglinks)
print(len(imglinks))

I will leave it to you to explore the scroll length, no of scrolls required.

I didn't try How to Crawl Infinite Scrolling Pages using Python, but it may also help you

Upvotes: 1

Why is requests_HTML only able to get 6 image links?

Answers (1)

Related Questions