Reputation: 39
What I'm trying to do is to grab all CSVs from this repo page. I know that I need to grab the raw version of it, but there are a lot of files, and for each of them I need to bind the rows with rbind
, in order to do further calculations. Is there a function to download each one of those at once?
P.s: of course I don't want to download each file locally, but only to read it and store only the resulting data frame in my environment
Upvotes: 1
Views: 431
Reputation: 389215
You can do this with combination of some web scraping with rvest
. Basically we dynamically create a url to read the data from and combine them into one dataframe using map_df
.
library(dplyr)
library(rvest)
url <- "https://github.com/pcm-dpc/COVID-19/tree/master/dati-regioni"
url %>%
read_html() %>%
html_nodes(xpath = '//*[@role="rowheader"]') %>%
html_nodes('span a') %>%
html_attr('href') %>%
head %>% # <- remove this line to read all the files.
sub('blob/', '', .) %>%
paste0('https://raw.githubusercontent.com', .) %>%
purrr::map_df(read.csv) -> combined_data
Note that I have added head
to test the answer only for first 6 files. You can remove it when you read all the files from the directory.
Upvotes: 2