Reputation: 43
I'm trying to scrape all (neftlix)movie links from ReelGood.com
this is my code so far: (with help from Stack members)
from bs4 import BeautifulSoup
import requests
import time
URL = "https://reelgood.com/movies/source/netflix"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
f = open("C:/Downloaders/test/Scrape/movies_netflix.txt", "w")
for link in soup.select('[itemprop=itemListElement] [itemprop=url]'):
data = link.get('content')
f.write(data)
f.write("\n")
this code does output the movie links to a txt file called movies_netflix.txt
but here is the catch, it only export links who are loaded in the default page. If you scroll down you see this button:
now what I want is to load the entire page before I scrape it. personaly i was thinking about a function that click the button as long as its there (if everything is loaded it disappears).
but I have no idea how to do this and if there is a better way to get all the movies loaded in to the page?
any suggestions?
helpfull info
Upvotes: 2
Views: 8874
Reputation: 406
Beautifulsoup doesn't have a click function. You could do this through Selenium, which does. There is another option which allows you to just use Beautifulsoup.
When you click the button the url changes to https://reelgood.com/movies/source/netflix?offset=50.
The offset increments by 50 up to 3750 as far as I can tell.
https://reelgood.com/movies/source/netflix?offset=3750 however doesn't show you the whole table, just the last page. So you could loop through the pages and collect all titles on that page and append it to your list.
something like:
for i in range(0, 3800, 50):
URL= "https://reelgood.com/movies/source/netflix?offset=" + str(i)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
f = open("C:/Downloaders/test/Scrape/movies_netflix.txt", "w")
for link in soup.select('[itemprop=itemListElement] [itemprop=url]'):
data = link.get('content')
f.write(data)
f.write("\n")
You might also consider removing your for loop and append all movies on a page to list or something and then write te whole list to a file in the end. Otherwise you would have to loop 76*50 times, which could take a long time.
Upvotes: 3