Reputation: 31
Im having trouble web scraping information from wikipedia and get the following error message:
Error in if (length(p) > 1 & maxp * n != sum(unlist(nrows)) & maxp * n != :
missing value where TRUE/FALSE needed
Not sure how to fix this problem, please help me out
url <- 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
wiki <- read_html(url) %>% html_nodes('table') %>% html_table(fill = TRUE)
names(wiki[[1]])
Output error:
Error in if (length(p) > 1 & maxp * n != sum(unlist(nrows)) & maxp * n != :
missing value where TRUE/FALSE needed
Upvotes: 1
Views: 250
Reputation: 84465
Assuming you want the big table you can use its id. Id should be the fastest selector method for an element
require(rvest)
r <- read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies") %>%
html_nodes("#constituents") %>%
html_table()
print(r)
Upvotes: 2
Reputation: 20399
The problem is that there are two tables on this webpage and you shoudl specify which one you want to scrape. Let's assume you want the first one you could do something like:
read_html(url) %>%
html_nodes('table') %>%
`[[`(1) %>% ## extract first table
html_table(fill = TRUE)
Upvotes: 1