Wacao
Wacao

Reputation: 51

Access denied - python selenium - even after using User-Agent and other headers

Using python, I am trying to extract the options chain data table published publicly by NSE exchange on https://www.nseindia.com/option-chain

Tried to use requests session as well as selenium, but somehow the website is not allowing to extract data using bot.

Below are the attempts done:

  1. Instead of plain requests, tried to setup a session and attempted to first get csrf_token from https://www.nseindia.com/api/csrf-token and then called the url. However the website seems to have certain additional authorization using javascripts.
  2. On studying the xhr and js tabs of chrome developer console, the website seems to be using certain js scripts for first time authorisation, so used selenium this time. Also passed useragent and Accept-Language arguments in headers (as per this stackoverflow answer) while loading driver. But somehow the access is still blocked by website.

Is there anything obvious that i am missing ? Or website will make all attempts to block automated extraction of data from website using selenium/requests + python? Either case, how do i extract this data?

Below is my current code: ( to get table contents from https://www.nseindia.com/option-chain)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36")
opts.add_argument("Accept-Language=en-US,en;q=0.5")
opts.add_argument("Accept=text/html")


driver = webdriver.Chrome(executable_path="C:\\chromedriver.exe",chrome_options=opts)
#driver.get('https://www.nseindia.com/api/csrf-token')
driver.get('https://www.nseindia.com/')
#driver.get('https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY')
driver.get('https://www.nseindia.com/option-chain')

Upvotes: 1

Views: 1396

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195613

The data is loaded via Javascript from external URL. But you need first to load cookies visiting other URL:

import json
import requests
from bs4 import BeautifulSoup


symbol = 'NIFTY'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
url = 'https://www.nseindia.com/api/option-chain-indices?symbol=' + symbol

with requests.session() as s:

    # load cookies:
    s.get('https://www.nseindia.com/get-quotes/derivatives?symbol=' + symbol, headers=headers)

    # get data:
    data = s.get(url, headers=headers).json()

    # print data to screen:
    print(json.dumps(data, indent=4))

Prints:

{
    "records": {
        "expiryDates": [
            "03-Sep-2020",
            "10-Sep-2020",
            "17-Sep-2020",
            "24-Sep-2020",
            "01-Oct-2020",
            "08-Oct-2020",
            "15-Oct-2020",
            "22-Oct-2020",
            "29-Oct-2020",
            "26-Nov-2020",
            "31-Dec-2020",
            "25-Mar-2021",
            "24-Jun-2021",
            "30-Dec-2021",
            "30-Jun-2022",
            "29-Dec-2022",
            "29-Jun-2023"
        ],
        "data": [
            {
                "strikePrice": 4600,
                "expiryDate": "31-Dec-2020",
                "PE": {
                    "strikePrice": 4600,
                    "expiryDate": "31-Dec-2020",
                    "underlying": "NIFTY",
                    "identifier": "OPTIDXNIFTY31-12-2020PE4600.00",
                    "openInterest": 19,
                    "changeinOpenInterest": 0,
                    "pchangeinOpenInterest": 0,
                    "totalTradedVolume": 0,
                    "impliedVolatility": 0,
                    "lastPrice": 31,
                    "change": 0,
                    "pChange": 0,
                    "totalBuyQuantity": 10800,
                    "totalSellQuantity": 0,
                    "bidQty": 900,
                    "bidprice": 3.05,
                    "askQty": 0,
                    "askPrice": 0,
                    "underlyingValue": 11647.6
                }
            },
            {
                "strikePrice": 5000,
                "expiryDate": "31-Dec-2020",

...and so on.

Upvotes: 1

Related Questions