Python - WebScraping using Request module-URL throws an error -403- forbidden

Question

I'm trying to get the data from https://www.ecfr.gov/cgi-bin/ECFR?page=browse using requests module in python

Somehow I'm getting HTTP 403-forbidden.

header = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
"Accept-Encoding": "gzip, deflate, br", 
"Accept-Language": "en-US,en;q=0.9", 
"Cache-Control": "max-age=0", 
"Host": "httpbin.org", 
"Sec-Fetch-Dest": "document", 
"Sec-Fetch-Mode": "navigate", 
"Sec-Fetch-Site": "none", 
"Sec-Fetch-User": "?1", 
"Upgrade-Insecure-Requests": "1", 
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36", 
"X-Amzn-Trace-Id": "Root=1-5ef3288f-10e678d0e55c0670c0807730"}

r = requests.get(url , headers= header)

I have also requested using user-agent and all the parameters in headers info(which I'm seeing in developer tools) .

I have tried using free proxies / rotating user header /cookies and everything i can get my hands on. But somehow website is able to know that I'm not using header.

In the html response - I'm seeing that website is asking to complete captcha.

Is there anyways I can skip that ?

Python - WebScraping using Request module-URL throws an error -403- forbidden

Answers (1)

Related Questions