R: download all files in a Google Drive public folder

Question

I'm trying to get data for RAIS (a Brazilian employee registry dataset) that is shared using a Google Drive public folder. This is the address: https://drive.google.com/folderview?id=0ByKsqUnItyBhZmNwaXpnNXBHMzQ&usp=sharing&tid=0ByKsqUnItyBhU2RmdUloTnJGRGM#list

Data is divided into one folder per year and within each folder there is one file per state to download. I would like to automate the downloading process in R, for all years, and if not at least within each year folder. Downloaded file names should follow the file names that occur when downloading manually.

A know a little R, but no web programming or web scraping. This is what I got so faar: By manually downloading the first of the 2012 file, I could see the URL my browser used to download: https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download

Thus, I suppose the file id is: 0ByKsqUnItyBhS2RQdFJ2Q0RrN0k

Searching the html code of the 2012 page I was able to find that ID and the file name associated with it: AC2012.7z. All the other ids' and file names are in that section of the html code. So, assuming I can download the file correctly, I suppose I could at least generalize tho the other files.

In R, I tried the flowing code to download the file:

url <- "https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download"
download.file(url,"AC2012.7z")
unzip("AC2012.7z")

It does download but I get and error when trying to uncompress the file (both within R and manually with 7.zip) There must be something wrong with file downloaded in R, as the the file size (3.412Kb) does not match what I get from manualy downloading the file (3.399Kb)

R: download all files in a Google Drive public folder

Answers (1)

Related Questions