LucasMation
LucasMation

Reputation: 2491

R: download all files in a Google Drive public folder

I'm trying to get data for RAIS (a Brazilian employee registry dataset) that is shared using a Google Drive public folder. This is the address: https://drive.google.com/folderview?id=0ByKsqUnItyBhZmNwaXpnNXBHMzQ&usp=sharing&tid=0ByKsqUnItyBhU2RmdUloTnJGRGM#list

Data is divided into one folder per year and within each folder there is one file per state to download. I would like to automate the downloading process in R, for all years, and if not at least within each year folder. Downloaded file names should follow the file names that occur when downloading manually.

A know a little R, but no web programming or web scraping. This is what I got so faar: By manually downloading the first of the 2012 file, I could see the URL my browser used to download: https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download

Thus, I suppose the file id is: 0ByKsqUnItyBhS2RQdFJ2Q0RrN0k

Searching the html code of the 2012 page I was able to find that ID and the file name associated with it: AC2012.7z. All the other ids' and file names are in that section of the html code. So, assuming I can download the file correctly, I suppose I could at least generalize tho the other files.

In R, I tried the flowing code to download the file:

url <- "https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download"
download.file(url,"AC2012.7z")
unzip("AC2012.7z")

It does download but I get and error when trying to uncompress the file (both within R and manually with 7.zip) There must be something wrong with file downloaded in R, as the the file size (3.412Kb) does not match what I get from manualy downloading the file (3.399Kb)

Upvotes: 3

Views: 3074

Answers (1)

dmh
dmh

Reputation: 1059

For anyone trying to solve this problem today, you can use the googledrive package.

library(googledrive)
ls_tibble <- googledrive::drive_ls(GOOGLE_DRIVE_URL_FOR_THE_TARGET_FOLDER)
for (file_id in ls_tibble$id) {
  googledrive::drive_download(as_id(file_id))
}

This will (1) trigger an authentication page to open in your browser to authorise the Tidyverse libraries using gargle to access Google Drive on behalf of your account and (2) download all the files in the folder at that URL to your current working directory for the current R session.

Upvotes: 1

Related Questions