Is there an R function to read multiple CSVs at once from a Github repo?

Question

What I'm trying to do is to grab all CSVs from this repo page. I know that I need to grab the raw version of it, but there are a lot of files, and for each of them I need to bind the rows with rbind, in order to do further calculations. Is there a function to download each one of those at once?

P.s: of course I don't want to download each file locally, but only to read it and store only the resulting data frame in my environment

Ronak Shah · Accepted Answer

You can do this with combination of some web scraping with rvest. Basically we dynamically create a url to read the data from and combine them into one dataframe using map_df.

library(dplyr)
library(rvest)

url <- "https://github.com/pcm-dpc/COVID-19/tree/master/dati-regioni"

url %>%
  read_html() %>%
  html_nodes(xpath = '//*[@role="rowheader"]') %>%
  html_nodes('span a') %>%
  html_attr('href') %>%
  head %>% # <- remove this line to read all the files. 
  sub('blob/', '', .) %>%
  paste0('https://raw.githubusercontent.com', .) %>%
  purrr::map_df(read.csv) ->  combined_data

Note that I have added head to test the answer only for first 6 files. You can remove it when you read all the files from the directory.

Is there an R function to read multiple CSVs at once from a Github repo?

Answers (1)

Related Questions