Scraping large amount of data with beautifulsoup, process being killed?

Question

There are exactly 100 items per page. I'm assuming it is some type of memory limit that's causing it to be killed. Also I have a feeling appending the items to a list variable is most likely not best practice when it comes to memory efficiency. Would opening a text file and writing to it be better? I've done a test with 10 pages and it creates the list successfully taking about 12 seconds to do so. When I try with 9500 pages however, the process gets automatically killed in about an hour.

import requests
from bs4 import BeautifulSoup
import timeit

def lol_scrape():
  start = timeit.default_timer()

  summoners_listed = []
  for i in range(9500):
    URL = "https://www.op.gg/leaderboards/tier?region=na&page="+str(i+1)
    user_agent = {user-agent}
    page = requests.get(URL, headers = user_agent)
    soup = BeautifulSoup(page.content, "html.parser")
    results = soup.find('tbody')
    summoners = results.find_all('tr')
    for i in range(len(summoners)):
      name = summoners[i].find('strong')
      summoners_listed.append(name.string)
    
  stop = timeit.default_timer()

  print('Time: ', stop - start)
  return summoners_listed

Scraping large amount of data with beautifulsoup, process being killed?

Answers (1)

Related Questions