Øystein Vaagen
Øystein Vaagen

Reputation: 31

Error in web scraping in R from wikipedia

Im having trouble web scraping information from wikipedia and get the following error message:

Error in if (length(p) > 1 & maxp * n != sum(unlist(nrows)) & maxp * n != :

missing value where TRUE/FALSE needed

Not sure how to fix this problem, please help me out

url <- 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
wiki <- read_html(url) %>% html_nodes('table') %>% html_table(fill = TRUE)

names(wiki[[1]])

Output error:


Error in if (length(p) > 1 & maxp * n != sum(unlist(nrows)) & maxp * n !=  : 
  missing value where TRUE/FALSE needed

Upvotes: 1

Views: 250

Answers (2)

QHarr
QHarr

Reputation: 84465

Assuming you want the big table you can use its id. Id should be the fastest selector method for an element

require(rvest)
r <- read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies") %>%
  html_nodes("#constituents") %>% 
  html_table()
print(r)

Upvotes: 2

thothal
thothal

Reputation: 20399

The problem is that there are two tables on this webpage and you shoudl specify which one you want to scrape. Let's assume you want the first one you could do something like:

read_html(url) %>% 
  html_nodes('table') %>% 
  `[[`(1) %>% ## extract first table
  html_table(fill = TRUE) 

Upvotes: 1

Related Questions