rajat kathuria
rajat kathuria

Reputation: 13

How to scrape fundamentals data of NSE indices (NIFTY 50) using R

I am trying to scrape fundamentals data table (pe ratio, pb ratio and dividend yield) from nse website (link). I tried the following from rvest package:

url = "https://www1.nseindia.com/products/content/equities/indices/historical_pepb.htm"
pgsession <-html_session(url)

But, I receive this error:

Error in curl::curl_fetch_memory(url, handle = handle) :
LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 60

Also, I tried the httr package (css selectors identified using Chrome extension 'SelectorGadget')

fd <- list(submit = "Get Data", # Not Sure if it's the correct css selector 
IndexName = "NIFTY 50", 
fromDate = "01-06-2020", 
toDate = "15-06-2020" ) 

resp<-POST(url, body=fd, encode="form")

But, I receive the same error. I have scanned many forums for troubleshooting the problem but, it seems the website is blocking scraping attempts. Can someone validate this or provide a way to scraping the table from this website?

Upvotes: 0

Views: 1233

Answers (2)

rajat kathuria
rajat kathuria

Reputation: 13

Here's a (crude) wrapper to fetch data for NIFTY 50 Fundamentals from NSE website

get.nse.ratios <- function(index.nse = 'NIFTY 50', date.start = as.Date('2001-01-01'), date.end = as.Date(Sys.time())){
  # url.base <- 'https://www1.nseindia.com/products/content/equities/indices/historical_pepb.htm'
  index.nse <- gsub(' ', '%20', index.nse)
  
  # Split Date range into acceptable range
  max.history.constraint <- 100
  dates.start <- seq.Date(date.start, date.end, by = max.history.constraint)
  data.master <- data.frame()
  # Loop over sub-periods to extract data
  for(fromDate in dates.start){
    toDate <- min(fromDate+(max.history.constraint - 1), as.Date(Sys.Date()))
    
    cat(sprintf('Fetching data from %s to %s \n', as.Date(fromDate), as.Date(toDate)))
    # browser()
    # Reformat dates
    fromDate <- format.Date(as.Date(fromDate), '%d-%m-%Y')
    toDate <- format.Date(as.Date(toDate), '%d-%m-%Y')
    
    # Infer url for sub-period
    url.sub <- sprintf("https://www1.nseindia.com/products/dynaContent/equities/indices/historical_pepb.jsp?indexName=%s&fromDate=%s&toDate=%s&yield1=undefined&yield2=undefined&yield3=undefined&yield4=all", index.nse, fromDate, toDate)
    
    # Scrape table from inferred url
    data.sub <- rvest::html_table(xml2::read_html(url.sub))[[1]]
    
    # Clean the table
    names.columns <- unname(unlist(data.sub[2,]))
    data.clean <- data.sub[3:(nrow(data.sub)-1),]
    colnames(data.clean) <- names.columns
    data.clean$Date <- as.Date(data.clean$Date, format = '%d-%b-%Y')
    cols.num <- names(which(sapply(data.clean, class) == 'character'))
    data.clean[cols.num] <- sapply(data.clean[cols.num],as.numeric)
    
    # Append to master data
    data.master <- rbind(data.master, data.clean)

  }
  
  return(data.master)
}

Upvotes: 0

Bas
Bas

Reputation: 4658

If you right-click the page, click 'Inspect element', and go to the 'Network' tab, you can see the request being made when you click the 'Get data' button.

In this case, the request is to the below URL, which can be easily read and parsed into a data frame using for example rvest::html_table().

By changing the URL I'm positive you can extract the table you want.

url <- "https://www1.nseindia.com/products/dynaContent/equities/indices/historical_pepb.jsp?indexName=NIFTY%2050&fromDate=01-06-2020&toDate=02-06-2020&yield1=undefined&yield2=undefined&yield3=undefined&yield4=all"

rvest::html_table(xml2::read_html(url))[[1]]

gives

  Historical NIFTY 50  P/E, P/B & Div. Yield values Historical NIFTY 50  P/E, P/B & Div. Yield values
1           For the period 01-06-2020 to 02-06-2020           For the period 01-06-2020 to 02-06-2020
2                                              Date                                               P/E
3                                       01-Jun-2020                                             22.96
4                                       02-Jun-2020                                             23.31
5                       Download file in csv format                       Download file in csv format
  Historical NIFTY 50  P/E, P/B & Div. Yield values Historical NIFTY 50  P/E, P/B & Div. Yield values
1           For the period 01-06-2020 to 02-06-2020           For the period 01-06-2020 to 02-06-2020
2                                               P/B                                         Div Yield
3                                              2.80                                              1.55
4                                              2.84                                              1.53
5                       Download file in csv format                       Download file in csv format

Upvotes: 1

Related Questions