Reputation:
I am trying to scrape website on google colab using beautiful soup. I am able to scrape the content on my local machine but when I am trying to scrape it on cloud. I am getting an error.
'\nAccess Denied\n\nAccess Denied\n \nYou don\'t have permission to access "http://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?" on this server.\nReference #18.740f1160.1544263996.61a6bb6e\n\n\n'
When I run the same code on my local machine it works fine tho.
import requests
import re
from bs4 import BeautifulSoup
url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM='
res = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(res.content, 'html.parser')
print(res)
Output :
<Response [403]>
Why is this happening and is there any way to get rid of it.
Upvotes: 1
Views: 5264
Reputation: 2282
Pretty sure this is server-side rate-limiting. Your code works fine for me in colab. You might try colab's "Reset all runtimes" feature to get assigned a new VM, to rule out any side-effects from other notebook code you've run.
Upvotes: 1