anonymous13
anonymous13

Reputation: 621

Http - Tunnel connection failed: 403 Forbidden error with Python web scraping

I am trying to web scrape a http website and I am getting below error when I am trying to read the website.

HTTPSConnectionPool(host='proxyvipecc.nb.xxxx.com', port=83): Max retries exceeded with url: http://campanulaceae.myspecies.info/ (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden',)))

Below is the code I have written with similar website. I tried using urllib and user-agent and still the same issue.

url = "http://campanulaceae.myspecies.info/"

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'})
soup = BeautifulSoup(response.text, 'html.parser')

Can anyone help me with the issue. Thanks in advance

Upvotes: 6

Views: 47882

Answers (2)

Lugangastar
Lugangastar

Reputation: 31

i tried using User-Agent: Defined and it worked for me.

url = "http://campanulaceae.myspecies.info/"
headers = {
"Accept-Language" : "en-US,en;q=0.5",
"User-Agent": "Defined",
}
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.text
soup = BeautifulSoup(data, 'html.parser')
print(soup.prettify())

If you get an error that says "bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser." Then it means you're not using the right parser, you'll need to import lxml at the top and install the module then use "lxml" instead of "html.parser" when you make soup.

Upvotes: 0

Sanjay
Sanjay

Reputation: 2008

you should try to add proxy while requesting url.

proxyDict = { 
          'http'  : "add http proxy", 
          'https' : "add https proxy"
        }

requests.get(url, proxies=proxyDict)

you can find more information here

Upvotes: 3

Related Questions