Reputation: 683
I'm using a loop to scrape tables from a website. Can't figure out how to combine the tables into one data frame. The following code works to scrape the relevant information for one page, but I'm not sure how to add the new table to the first one (or a preexisting one). Thanks.
for (i in 1:10){
link <- paste0("https://website.com/page",i)
remDr$navigate(link)
# grab the html
pg <- remDr$getPageSource() %>% .[[1]] %>%
read_html()
#grab the correct table
table <- pg %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>%
.[[1]]
# combine tables?
}
Upvotes: 0
Views: 170
Reputation: 1007
If you want to keep the loop, declare a data frame before the loop body, and keep adding to it at every iteration using rbind
:
big_df <- data.frame()
for (i in 1:10){
link <- paste0("https://website.com/page", i)
remDr$navigate(link)
# grab the html
pg <- remDr$getPageSource() %>% .[[1]] %>%
read_html()
# grab the correct table
table <- pg %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>%
.[[1]]
# combine tables?
big_df <- rbind(big_df, table)
}
A better (and faster) way of doing this would be to put the loop body in a function, lapply it to 1:10
to yield a list of data frames, and then use data.table::rbindlist
to put all of those together:
df_list <- lapply(1:10, function (i) {
link <- paste0("https://website.com/page", i)
remDr$navigate(link)
# grab the html
pg <- remDr$getPageSource() %>% .[[1]] %>%
read_html()
# grab the correct table
table <- pg %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>%
.[[1]]
return(table)
})
big_df <- data.table::rbindlist(df_list)
Upvotes: 1