Zanam
Zanam

Reputation: 4807

R Error using readHTMLTable

I am using the following code:

url  = "http://finance.yahoo.com/q/op?s=DIA&m=2013-07"

library(XML)
tabs = readHTMLTable(url, stringsAsFactors = F)

I get the following error:

Error: failed to load external entity "http://finance.yahoo.com/q/op?s=DIA&m=2013-07"

When I use the url in the browser it works fine. So, what am I doing incorrect here?

Thanks

Upvotes: 8

Views: 17401

Answers (2)

Raj Ayala
Raj Ayala

Reputation: 23

I just got the same error as above "failed to load external entity" when using url <- "http://www.cisco.com/c/en/us/products/a-to-z-series-index.html" doc <- htmlTreeParse(url, useInternal=TRUE)

I came across this and another post on the topic, which didn't solve my problem. This code worked before. I then realized that I was on corporate VPN. I got off the VPN and tried again and it worked. So, being on VPN might be another reason why you would get the above error. Getting off VPN solves it.

Upvotes: 0

SchaunW
SchaunW

Reputation: 3601

It's difficult to know for sure since I can't replicate your error, but according the package's author (see http://comments.gmane.org/gmane.comp.lang.r.mac/2284), XML's methods for getting web content are pretty minimalistic. A workaround is to use RCurl to get the content and XML to parse it:

library(XML)
library(RCurl)

url <- "http://finance.yahoo.com/q/op?s=DIA&m=2013-07"

tabs <- getURL(url)
tabs <- readHTMLTable(tabs, stringsAsFactors = F)

Or, if RCurl still throws an error, try the httr package:

library(httr)

tabs <- GET(url)
tabs <- readHTMLTable(rawToChar(tabs$content), stringsAsFactors = F)

Upvotes: 16

Related Questions