Reputation: 93
Using XML package in R, I thought of extracting a table with the below mentioned query,
url <- "https://in.finance.yahoo.com/intlindices?e=americas"
America <- readHTMLTable(url, which=1, header=TRUE, stringsAsFactors=FALSE)
when I executed the above mentioned query, I got the output as,
**Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: 'https://in.finance.yahoo.com/intlindices?e=americas'**
when I parse the url, I got the below error,
**Warning message: XML content does not seem to be XML: **
Therefore, kindly help me to know as whether i am not using the right package or my way of doing coding is wrong.
Upvotes: 0
Views: 286
Reputation: 59345
Try this:
library(httr)
library(XML)
doc <- content(GET(url), type="text/html")
readHTMLTable(doc["//div[@id='yfitp']"][[1]])
# V1 V2 V3 V4 V5
# 1 ^MERV MerVal 10,887.94 12 Sep 1:30am 181.54 (1.64%) Components, Chart, More
# 2 ^BVSP Bovespa 46,400.50 12 Sep 1:47am 103.49 (0.22%) Components, Chart, More
# 3 ^GSPTSE S&P TSX Composite 13,461.47 12 Sep 1:50am 108.42 (0.80%) Chart, More
# 4 ^MXX IPC 42,780.73 12 Sep 1:36am 107.78 (0.25%) Components, Chart, More
# 5 ^GSPC 500 Index 1,961.05 12 Sep 2:02am 8.76 (0.45%) Chart, More
Edit: Clarification based on comment below.
The term doc["//div[@id='yfitp']"]
is equivalent to getNodeSet(doc, "//div[@id='yfitp']")
and returns a list of the nodes in doc
which satisfy the specified xPath filter. Since this is a nodeSet, but readHTMLTable(...)
requires a node, we grab the first node in the nodeset (also the only node, in this case).
If the question is how to determine the xPath string, I just examined the DOM of the page in Firefox and it was clear that the relevant table was a child node of the div element, as:
<div id=yfitp>
<table>
...
</table>
</div>
Upvotes: 1