Rohit Lamba
Rohit Lamba

Reputation: 186

Collect Data from web link to List/Dataframe

I have a web link as below:

https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp

I use the below code to collect the data but getting error as:

requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(10060, 'WSAETIMEDOUT')",))

My Code:

from requests import Session
import lxml.html

expiry_list = []
try:
    session = Session()
    headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}

    session.headers.update(headers)
    url = 'https://www1.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp'
    params = {'symbolCode': 9999, 'symbol': 'BANKNIFTY', 'instrument': 'OPTIDX', 'date': '-', 'segmentLink': 17}
    response = session.get(url, params=params)
    soup = lxml.html.fromstring(response.text)
    expiry_list = soup.xpath('//form[@id="ocForm"]//option/text()')
    expiry_list.remove(expiry_list[0])
except Exception as error:
    print("Error:", error)

print("Expiry_Date =", expiry_list)

Its working perfect in my local machine but giving error in Amazon EC2 Instance Any settings need to be changed for resolving request timeout error.

Upvotes: 0

Views: 175

Answers (1)

alex
alex

Reputation: 806

AWS houses lots of botnets, so spam blacklists frequently list AWS IPs. Your EC2 is probably part of an IP block that is blacklisted. You might be able to verify by putting your public EC2 IP in here https://mxtoolbox.com/. I would try verifying if you can even make a request via curl from the command line curl -v {URL}. If that times out, then I bet your IP is blocked by the remote server's firewall rules. Since your home IP has access, you can try to setup a VPN on your network, have the EC2 connect to your VPN, and then retry your python script. It should work then, but it will be as if you're making the request from your home (so don't do anything stupid). Most routers allow you to setup an OpenVPN or PPTP VPN right in the admin UI. I suspect that once your EC2's IP changes, you'll trick the upstream server and be able to scrape.

Upvotes: 1

Related Questions