Reputation: 322
What is the difference between c() and list()? I'm learning some webscraping and ran into an unexpected error. I wrote a small script to scrape baseball data from a few pages on ESPN's website:
library(magrittr)
library(rvest)
Baseball <- read_html("http://www.espn.com/mlb/stats/batting/_/qualified/true")
Baseball.2 <- read_html("http://www.espn.com/mlb/stats/batting/_/count/41/qualified/true")
Baseball.3 <- read_html("http://www.espn.com/mlb/stats/batting/_/count/81/qualified/true")
Baseball.4 <- read_html("http://www.espn.com/mlb/stats/batting/_/count/121/qualified/true")
Baseball.list <- c(Baseball, Baseball.2, Baseball.3, Baseball.4)
scrape <- function(html) {
temp.df <- data.frame(1:length(html %>%
html_nodes(paste0("td:nth-child(2)")) %>%
html_text()))
for (i in 2:19) {
temp.df[i - 1] <-
html %>%
html_nodes(paste0("td:nth-child(", i, ")")) %>%
html_text()
}
temp.df
}
when I run df <- lapply(Baseball.list, scrape)
I get:
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "externalptr"
But, if I run Baseball.list <- list(Baseball, Baseball.2, Baseball.3, Baseball.4)
and then use lapply and my function in the exact same way it works without a problem! I checked the documentation for c()
and see that:
"This is a generic function which combines its arguments.
The default method combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed," whereas the documentation for list()
says its coerces objects into a list. Can someone explain why using c()
in this instance causes lapply to fail? I'm not understanding the documentation.
Upvotes: 4
Views: 4668
Reputation: 1490
Exactly as the documentation for c()
says,
"All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed"
The list keeps the classes of the documents as was intended by xml2::read_html
. If you look at the source code for xml2 , you'll see that the generic method xml_find_all
is only defined for classes of xml_missing
, xml_node
and xml_nodeset
> class(read_html("<html><title>Hi<title></html>"))
[1] "xml_document" "xml_node"
> a = read_html("<html><title>Hi<title></html>")
> b = read_html("<html><title>Hi<title></html>")
> c = read_html("<html><title>Hi<title></html>")
> lapply(c(a,b,c), class)
$node
[1] "externalptr"
$doc
[1] "externalptr"
$node
[1] "externalptr"
$doc
[1] "externalptr"
$node
[1] "externalptr"
$doc
[1] "externalptr"
> lapply(list(a,b,c), class)
[[1]]
[1] "xml_document" "xml_node"
[[2]]
[1] "xml_document" "xml_node"
[[3]]
[1] "xml_document" "xml_node"
Upvotes: 2