ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19395

how to download and unzip efficiently large files from a http link?

I have a list of files such as

mylist <- c('http://myweb/myzip1.gz',
            'http://myweb/myzip2.gz',
            'http://myweb/myzip3.gz)

I need to download them and unzip them to another path D://mydata/.

Right now, I have used purrr and download.file

#get files
myfunc <- function(mystring){
  download.file(mystring,
                destfile =  paste0('D://mydata/', basename(mystring))
}

#download data
map(mylist, myfunc)

but after a few hours of downloads (each file is 10GB+), Rstudio freezes (even though the downloads still occur in the background).

Is there are more efficient way? I would like to keep track of the downloads in R without having to freeze at some point.

Thanks!

Upvotes: 0

Views: 396

Answers (1)

JonMinton
JonMinton

Reputation: 1279

I don't think the info above is enough to give 'an answer' as a single code chunk, but I think there are a few things you could do that, collectively would solve the problem:

  1. Try running R in terminal mode rather than the RStudio IDE proper. (This is accessible from newer versions of Rstudio.)
  2. 'Chunk' the task into smaller batches, for example you could split the list of filenames using seq_along(mylist) %/% N, where N is the chunk size. Consider using a for loop for iterating between batches, and purrr only within the batches.
  3. Explicitly remove files you have loaded recently into the R environment, then make explicit calls to garbage collector gc() to remove them from RAM.

Upvotes: 2

Related Questions