Reputation: 3318
I am looking for some non-Selenium way to mine data from a Website using R
(preferably) or Python
.
In R I used below code to do the same-
library(rvest)
library(XML)
Link = 'https://www.bseindia.com/stock-share-price/itc-ltd/itc/500875/'
read_html(Link) %>% html_nodes(".textvalue .ng-binding") %>% html_text()
## character(0)
Ideally I should be able to get most of the numerical values. But as you see it could not be able to download anything. Any pointer towards the right approach will be highly beneficial.
I also tried with BeautifulSoup module
from Python
as below without any success-
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
uClient = uReq("https://www.bseindia.com/stock-share-price/itc-ltd/itc/500875/")
page_html = uClient.read()
page_soup = soup(page_html, 'html.parser')
page_soup.findAll("div", {"class":"textvalue.ng-binding"})
Thanks,
Upvotes: 1
Views: 219
Reputation: 84475
This is easy as you can use the API the page uses. The return json has all the values but I am printing only one.
Python:
import requests
r = requests.get('https://api.bseindia.com/BseIndiaAPI/api/StockTrading/w?flag="etype=EQ&scripcode=500875').json()
print(r['MktCapFF'])
R:
library(rvest)
library(jsonlite)
r <- read_html('https://api.bseindia.com/BseIndiaAPI/api/StockTrading/w?flag="etype=EQ&scripcode=500875') %>%html_text() %>%jsonlite::fromJSON(.)
print(r$MktCapFull)
Upvotes: 1