Reputation: 25
I 'm trying to scrape http://then.gasbuddy.com/.
I'm running the next code in R
library(RCurl)
library(XML)
doc <- htmlTreeParse('http://www.southcarolinagasprices.com/GasPriceSearch.aspx?typ=adv&fuel=A&srch=0&area=All%20Areas&station=All%20Stations&tme_limit=4')
rootNode <- xmlRoot(doc)
((rootNode[[2]][4])[1][[1]])[[15]][[1]][[11]][[1]][[1]][[2]][[8]][[1]][[2]][[1]][[1]][[1]][[1]][[1]][[1]]
#<div class="p1"/>
x <- matrix(, nrow = 20, ncol = 4)
x[1,1] <- xmlValue(((rootNode[[2]][4])[1][[1]])[[15]][[1]][[11]][[1]][[1]][[2]][[8]][[1]][[2]][[1]][[1]][[1]][[1]][[1]][[1]])
But I have this error
replacement has length zero
How can I subtract p1 and put it in a matrix?
Upvotes: 0
Views: 924
Reputation: 78832
You've come up with an interesting way to get around their price obfuscation. Since they didn't restrict scraping in their Terms of Service, here's one way you can scrape the prices:
library(xml2)
doc <- read_html('http://www.southcarolinagasprices.com/GasPriceSearch.aspx?typ=adv&fuel=A&srch=0&area=All%20Areas&station=All%20Stations&tme_limit=4')
prices <- xml_find_all(doc, xpath="//div[@class='sp_p']")
sapply(prices, function(x) {
as.numeric(paste(gsub("d", "\\.",
gsub("^p", "",
unlist(xml_attrs(xml_find_all(x, "./div"))))),
collapse=""))
})
## [1] 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.67 1.68 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69
## [20] 1.70 1.71 1.72 1.72 1.73 1.73 1.73 1.73 1.73 1.73 1.73 1.73 1.73 1.74 1.74 1.74 1.74 1.74 1.74
## [39] 1.74 1.74 1.74 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.76 1.76
## [58] 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.76 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77
## [77] 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77
## [96] 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78
## [115] 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78
## [134] 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79
## [153] 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79
## [172] 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79
## [191] 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79
Upvotes: 2
Reputation: 13903
The error means what it says. Look at the return value from
xmlValue(((rootNode[[2]][4])[1][[1]])[[15]][[1]][[11]][[1]][[1]][[2]][[8]][[1]][[2]][[1]][[1]][[1]][[1]][[1]][[1]])
It's
character(0)
Because <div class="p1"/>
is a self-closing tag that doesn't contain any text. As the error message indicates, it's an error in R to replace part of a vector with something that has length zero. If you want these length-zero results to return something like NA
or ""
, you need to use an if
/else
construction.
Upvotes: 1