Mathieu
Mathieu

Reputation: 67

How to read a csv file or load an excel workbook by ignoring some characters in the file path?

I'm writing a loop script which involves reading a file from a workbook (using the package XLConnect). The challenge is that the file names contain characters (representing time) that I want to ignore.

For example, here are 3 paths to those files:

G://User//Documents//daily_data//Op_Schedule_20160520_132025.xlsx G://User//Documents//daily_data//Op_Schedule_20160521_142805.xlsx G://User//Documents//daily_data//Op_Schedule_20160522_103052.xlsx

I need to import hundreds of those files. I can easily account for the character string representing the date (e.g. 20160522), but not the time.

Is there a way to tell R to ignore some characters located in the file path? Here is how I was thinking of writing my script (the "???" is where i need help). I know a loop is probably not the most efficient way, but i'm open to suggestions, should you have any:

require(XLConnect)
path= "G://User//Documents//daily_data//Op_Schedule_"

wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")

scheduleList = rep(list(matrix(1,1,1)),length(wd.seq))

for(i in 1:length(wd.seq)) { 
wb = loadWorkbook(file= paste0(path,wd.seq[i],"???",".xlxs"))
scheduleList[[i]] = readWorksheet(wb,sheet='=SCHEDULE', header = TRUE)
}

`

Thanks for reading and suggestions, if any.

Mathieu

Upvotes: 1

Views: 258

Answers (1)

Kári Gunnarsson
Kári Gunnarsson

Reputation: 108

I don't know if this is helpful, but if you want to read all the files in a certain directory (which it seems to me is what you're after), you can read all the filenames into a list using the list.files() function, for example

fileList <- list.files(""G://User//Documents//daily_data//")

And then load the xlsx files looping through the list with a for loop

for(i in fileList) {
    loadWorkbook(file = i)
    }

I haven't used the XLConnect function before so that exact code probably doesn't work, but the loop will iterate through all the files in that directory and so you can construct your loading call using the i variable for the filename (it won't be an absolute path though, so you might need to use paste to add the first part of the filepath)

I realize there might be other files in the directory that are not excel files, you could use grepl to select only files containg "OP_Schedule_"

fileListClean <- fileList[grepl("Op_Schedule_",fileList)]

or perhaps only selecting .xlsx files in the directory:

fileListClean <- fileList[grepl(".xlsx",fileList)]

Edit to fit your reply: Since you need to fit it to a sequence, you can do it as you did earlier:

wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
wd.seq2 <- paste("Op_Schedule_", wd.seq, sep = "")

And then use grepl to only pick files starting with that extensions:

fileListClean <- fileList[grepl(paste(wd.seq2, collapse = "|"), fileList)]

Full disclosure: The last part i got from this SO answer: grep using a character vector with multiple patterns

Upvotes: 1

Related Questions