Király Csaba
Király Csaba

Reputation: 11

Web scraping - page is not loading after 5-6 requests loaded

I'm trying to scrape a specific website's subpages. I'm using requests and bs4. I have the pages stored in a list that I use for looping. The scripts works fine with other websites, so I think I have some problems with the page itself. I can't access the page with my browser(s), or just for a limited time (few seconds). I've tried all of my browsers(Chrome, Firefox, Edge, Explorer) removed every cookie and other browsing datas, etc...) I'm using headers:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36',
    "Upgrade-Insecure-Requests": "1", "DNT": "1",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate"}

and here is the code to request the page:

cz_link= requests.get(cz_page,timeout=10, verify=False,headers=headers)

where "cz_page" is the item in the list that holds the pages I want to parse.

After 5 or 6 pages are loaded the next page won't load.

I've tried "https://downforeveryoneorjustme.com/" to check if the page is up, and it is, "it's just me."

Is there any way that I can access the pages through python requests regardless I'm not able to load the site in my browser(s)?

My next try will be to run the script with VPN on, but I'm curious if there is an other solution, I'm not able to use VPN all the time when I need to run this script.

Thank you!

Upvotes: 0

Views: 1086

Answers (1)

Király Csaba
Király Csaba

Reputation: 11

The solution was to add a delay, but bigger than 5 sec. I experienced with it and it seems that after 5 page is loaded I got blocked and I had to wait for 10 minutes at least to try again. So I added a counter inside the loop, and after it hit 5 I used time.sleep() for 10 mins and restarted the counter. It is slow, but it works. Thanks for the suggestions though!

Upvotes: 1

Related Questions