Reputation: 2477
I have implemented a code to download bhav-copies for all the dates in the stock market. After scraping about 2 years, it seems like my IP got blocked.
This code doesn't work for me.
import urllib.request
url = 'https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip'
response = urllib.request.urlopen(url)
It gives the following error :
urllib.error.HTTPError: HTTP Error 403: Forbidden
I would like to know how I can use some proxy to get the data. Any help would be really appreciated.
Upvotes: 5
Views: 3170
Reputation: 93
You don't need to use proxies to download this file. The code below will work like a charm:
import urllib.request
url = 'https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip'
req = urllib.request.Request(url)
# Add referer header to bypass "HTTP Error 403: Forbidden"
req.add_header('Referer', 'https://www.nseindia.com')
res = urllib.request.urlopen(req)
# Save it into file.zip
with open("file.zip", "wb") as f:
f.write(res.read())
In case you want to get free proxies, visit https://free-proxy-list.net/. Then follow the answer of @pyd at https://stackoverflow.com/a/63328368/8009647
Upvotes: 1
Reputation: 6159
This works fine,
import requests
headers = {
'authority': 'www.nseindia.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
'accept': '*/*',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.nseindia.com/content/',
'accept-language': 'en-US,en;q=0.9,lb;q=0.8',
}
url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip"
r = requests.get(url,headers=headers)
with open("data.zip","wb") as f:
f.write(r.content)
if you have proxies,
proxy = {"http" : "x.x.x.x:pppp",
"https" :"x.x.x.x:pppp",
}
r = requests.get(url, headers=headers, proxies=proxy)
Upvotes: 2
Reputation: 17368
import urllib.request
proxy_host = '1.2.3.4:8080' # host and port of your proxy
url = 'https://www1.nseindia.com/content/historical/DERIVATIVES/2014/APR/fo01APR2014bhav.csv.zip'
req = urllib.request.Request(url)
req.set_proxy(proxy_host, 'http')
response = urllib.request.urlopen(req)
For more flexibility, you can use a Proxy Handler - https://docs.python.org/3/library/urllib.request.html
proxy_handler = urllib.request.ProxyHandler({'http': '1.2.3.4:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
Upvotes: 2