Reputation: 830
I have:
library(XML)
my_URL <- "http://www.velocitysharesetns.com/viix"
tables <- readHTMLTable(my_URL)
The above outputs just the table located at the top of the page. It looks like the pie chart is ignored and the fact that is javascript explains it. Are there any simple solutions to extracting the two % figures in the chart?
Have looked at RSelenium
however i am getting some errors to which i haven't been able to find any solutions.
> RSelenium::startServer()
Error in if (file.exists(file) == FALSE) if (!missing(asText) && asText == :
argument is of length zero
In addition: Warning messages:
1: startServer is deprecated.
Users in future can find the function in file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see vignette("RSelenium-docker", package = "RSelenium")
2: running command '"java" -jar "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/selenium-server-standalone.jar" -log "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/sellog.txt"' had status 127
3: running command '"wmic" path win32_process get Caption,Processid,Commandline /format:htable' had status 44210
>
Based on Phillip's answer i came up with the flowing solution:
library(XML)
# extarct HTML
doc.html = htmlTreeParse('http://www.velocitysharesetns.com/viix',
useInternal = TRUE)
# convert to text
htmltxt <- paste(capture.output(doc.html, file=NULL), collapse="\n")
# get location of string
pos = regexpr('CBOE SHORT-TERM VIX FUTURE', htmltxt)
# extarct from "pos" to nchar to end of string
keep = substr(htmltxt, pos, pos+98)
Output:
> keep
[1] "CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],\n\n ['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],\n"
Upvotes: 2
Views: 773
Reputation: 1817
Using RSelenium
This solution works for me using Rselenium (using Windows 7 and after inspecting the source of the webpage). Note that I use chromedriver.exe
library(RSelenium)
checkForServer(update = TRUE)
#### I use Chromedriver
startServer(args = c("-Dwebdriver.chrome.driver=C:/Stuff/Scripts/chromedriver.exe"))
remDr <- remoteDriver(remoteServerAddr = "localhost", browserName="chrome", port=4444)
### Open Chrome
remDr$open()
remDr$navigate("http://www.velocitysharesetns.com/viix")
b <- remDr$findElements(using="class name", value="jqplot-pie-series")
sapply(b, function(x){x$getElementAttribute("outerHTML")})
The last command returns
[[1]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 100px; top: 106px;\"><div style=\"color:white;font-weight:bold;\">82%</div></div>"
[[2]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 159px; top: 67px;\"><div style=\"color:white;font-weight:bold;\">18%</div></div>"
You can see that the percentage numbers appear there and can be easily extracted.
Using just the plain html
In addition the data can also be fetched by reading just the html source because the data are already included. Somewhere in the source you will find:
<script type="text/javascript" language="javascript">
$(document).ready(function(){
var data = [
['CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],
['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],
];
This is what you are looking for. The numbers are rounded before shown in the figure.
Upvotes: 3