Panda_moan_i_am
Panda_moan_i_am

Reputation: 15

loop through list of urls and download html table R

I am trying to download player information from basketball reference.

I have a csv that i have imported as a dataframe (data_allplayers) that has two columns one with the url and the other with the name that i want to save that file as.

https://www.basketball-reference.com/players/g/gordoaa01/gamelog/2020
Aaron Gordon
2
https://www.basketball-reference.com/players/h/holidaa01/gamelog/2020
Aaron Holiday
3
https://www.basketball-reference.com/players/n/naderab01/gamelog/2020
Abdel Nader

etc for 529 rows.

I want to loop through it and have each url and the main datatable at that location saved to a dataframe which is then stored with the players name.

I can download these tables perfectly but i manually / indiviudally using:

#player1  
webpage <- read_html("https://www.basketball-reference.com/players/g/gordoaa01/gamelog/2020")
tbls <- html_nodes(webpage, "table") %>% 
  html_table(fill = TRUE)
Aaron_Gordon <- as.data.frame(tbls[8])

But I am having no joy turning this into a loop using the urls already populated in my list. The full code i have tried is below, any help is greatly appreciated!

# Load libraries
library(dplyr) 
library(readxl)
library(rvest)
library(data.table) 
library(readr)
library(plyr)



data_allplayers <- read_csv("NBA_rebounds - players1.csv")
#delete the unwanted columns, add headers
data_allplayers <- select(data_allplayers, url, full_name)
header <- c("url", "name")
setnames(data_allplayers, header)
#removes first row
data_allplayers <- data_allplayers[-c(1), ]


#attempt at loop that doesn't work

for(i in 1:nrow(data_allplayers)){
  webpage <- read_html(data_allplayers$url[[i,]])
  tbls <- html_nodes(webpage, "table") %>% 
    html_table(fill = TRUE)
  Data_scrape <- as.data.frame(tbls[8])
  Report1_Name <- data_allplayers$name[[i,]]
  write.csv(Data_scrape, paste0(Report1_Name,".csv"))
}

Upvotes: 1

Views: 286

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

Here is one way to do this with Map :

library(rvest)

Map(function(x, y) {
  read_html(x) %>%
    html_nodes('table') %>%
    html_table(fill = TRUE) %>%
    .[[8]] %>%
    write.csv(paste0(y, '.csv'), row.names = FALSE)
}, data_allplayers$url, data_allplayers$name)

This works fine for me for the 3 values that you shared.

data_allplayers <- structure(list(url = c("https://www.basketball-reference.com/players/g/gordoaa01/gamelog/2020 ", 
"https://www.basketball-reference.com/players/h/holidaa01/gamelog/2020 ", 
"https://www.basketball-reference.com/players/n/naderab01/gamelog/2020 "
), name = c(" Aaron Gordon", " Aaron Holiday", " Abdel Nader"
)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Related Questions