Reading numerous html tables into R

Question

I'm trying to pull html data tables into a single data frame, and I'm looking for an elegant solution. There are 255 tables, and the urls vary by two variable: Year and Aldermanic District. I know there must be a way to use for loops or something, but I'm stumped.

I have successfully imported the data by reading each table in with a separate line of code, but this results in a line for each table, and again, there are 255 tables.

library(XML)
data <- bind_rows(readHTMLTable("http://assessments.milwaukee.gov/SalesData/2018_RVS_Dist14.htm", skip.rows=1),
                   readHTMLTable("http://assessments.milwaukee.gov/SalesData/2017_RVS_Dist14.htm", skip.rows=1),
                   readHTMLTable("http://assessments.milwaukee.gov/SalesData/2016_RVS_Dist14.htm", skip.rows=1),
                   readHTMLTable("http://assessments.milwaukee.gov/SalesData/2015_RVS_Dist14.htm", skip.rows=1),

Ideally, I could use a for loop or something so I wouldn't have to hand code the readHTMLTable function for each table.

Tim Biegeleisen · Accepted Answer

You could try creating a vector containing all the URLs which you want to scrape, and then iterate over those inputs using a for loop:

url1 <- "http://assessments.milwaukee.gov/SalesData/"
url2 <- "_RVS_Dist"
years <- c(2015:2018)
dist <- c(1:15)
urls <- apply(expand.grid(paste0(url1, years), paste0(url2, dist)), 1, paste, collapse="")
data <- NULL
for (url in urls) {
    df <- readHTMLTable(url)
    data <- rbind(data, df)
}

Reading numerous html tables into R

Answers (2)

Related Questions