Thorst
Thorst

Reputation: 1610

R download and read many excel files, automatically

I need to download a few hundred number of excel files and import them into R each day. Each one should be their own data-frame. I have a csv. file with all the adresses (the adresses remains static).

The csv. file looks like this:

http://www.www.somehomepage.com/chartserver/hometolotsoffiles%a
http://www.www.somehomepage.com/chartserver/hometolotsoffiles%b
http://www.www.somehomepage.com/chartserver/hometolotsoffiles%a0
http://www.www.somehomepage.com/chartserver/hometolotsoffiles%aa11
etc.....

I can do it with a single file like this:

library(XLConnect)

my.url <- "http://www.somehomepage.com/chartserver/hometolotsoffiles%a"

loc.download <- "C:/R/lotsofdata/" # each files probably needs to have their own name here? 

download.file(my.url, loc.download, mode="wb")

df.import.x1 = readWorksheetFromFile("loc.download", sheet=2)) 

# This kind of import works on all the files, if you ran them individually

But I have no idea how to download each file, and place it separately in a folder, and then import them all into R as individual data frames.

Upvotes: 0

Views: 2504

Answers (1)

csgillespie
csgillespie

Reputation: 60522

It's hard to answer your question as you haven't provided a reproducible example and it isn't clear what you exactly want. Anyway, the code below should point you in the right direction.

  1. You have a list of urls you want to visit:

    urls = c("http://www/chartserver/hometolotsoffiles%a",
                 "http://www/chartserver/hometolotsoffiles%b")
    

    in your example, you load this from a csv file

  2. Next we download each file and put it in a separate directory (you mentioned that in your question

    for(url in urls) {
      split_url = strsplit(url, "/")[[1]]
      ##Extract final part of URL
      dir = split_url[length(split_url)]
      ##Create a directory
      dir.create(dir)
      ##Download the file
      download.file(url, dir, mode="wb")
    }
    
  3. Then we loop over the directories and files and store the results in a list.

    ##Read in files
    l = list(); i = 1
    dirs = list.dirs("/data/", recursive=FALSE)
    for(dir in dirs){
      file = list.files(dir, full.names=TRUE)
      ##Do something?
      ##Perhaps store sheets as a list
      l[[i]] = readWorksheetFromFile(file, sheet=2)
      i = i + 1
    }
    

We could of course combine steps two and three into a single loop. Or drop the loops and use sapply.

Upvotes: 3

Related Questions