John Doe
John Doe

Reputation: 43

Python Beautifulsoup - click load more button

I'm trying to scrape all (neftlix)movie links from ReelGood.com

this is my code so far: (with help from Stack members)

from bs4 import BeautifulSoup
import requests
import time

URL = "https://reelgood.com/movies/source/netflix"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

f = open("C:/Downloaders/test/Scrape/movies_netflix.txt", "w")
for link in soup.select('[itemprop=itemListElement] [itemprop=url]'):
    data = link.get('content')
    f.write(data)
    f.write("\n")

this code does output the movie links to a txt file called movies_netflix.txt

but here is the catch, it only export links who are loaded in the default page. If you scroll down you see this button: loadbutton

now what I want is to load the entire page before I scrape it. personaly i was thinking about a function that click the button as long as its there (if everything is loaded it disappears).

but I have no idea how to do this and if there is a better way to get all the movies loaded in to the page?

any suggestions?

helpfull info

HTML sourcecode

Upvotes: 2

Views: 8874

Answers (1)

Beek
Beek

Reputation: 406

Beautifulsoup doesn't have a click function. You could do this through Selenium, which does. There is another option which allows you to just use Beautifulsoup.

When you click the button the url changes to https://reelgood.com/movies/source/netflix?offset=50.

The offset increments by 50 up to 3750 as far as I can tell.

https://reelgood.com/movies/source/netflix?offset=3750 however doesn't show you the whole table, just the last page. So you could loop through the pages and collect all titles on that page and append it to your list.

something like:

for i in range(0, 3800, 50):
    URL= "https://reelgood.com/movies/source/netflix?offset=" + str(i)
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, "html.parser")

    f = open("C:/Downloaders/test/Scrape/movies_netflix.txt", "w")
    for link in soup.select('[itemprop=itemListElement] [itemprop=url]'):
        data = link.get('content')
        f.write(data)
        f.write("\n")

You might also consider removing your for loop and append all movies on a page to list or something and then write te whole list to a file in the end. Otherwise you would have to loop 76*50 times, which could take a long time.

Upvotes: 3

Related Questions