Applying a function on all csv files from a certain folder

Question

I am reading csv files from a certain folder, which all have the same structure. Furthermore, I have created a function which adds a certain value to a dataFrame.

I have created the "folder reading" - part and also created the function. However, I now need to connect these two with each other. This is where I am having my problems:

Here is my code:

addValue <- function(valueToAdd, df.file, writterPath) {
    df.file$result <- df.file$Value + valueToAdd
    x <- x + 1 
    df.file <- as.data.frame(do.call(cbind, df.file))
    fullFilePath <- paste(writterPath, x , "myFile.csv", sep="")
    write.csv(as.data.frame(df.file), fullFilePath)
}

#1.reading R files
path <- "C:/Users/RFiles/files/"
files <- list.files(path=path, pattern="*.csv")
for(file in files)
{
  perpos <- which(strsplit(file, "")[[1]]==".")
  assign(
    gsub(" ","",substr(file, 1, perpos-1)), 
    read.csv(paste(path,file,sep="")))
}

#2.appyling function  
writterPath <- "C:/Users/RFiles/files/results/"
addValue(2, sys, writterPath)

How to apply the addValue() function in my #1.reading R files construct? Any recommendations?

I appreciate your answers!

UPDATE

When trying out the example code, I get:

+   }
+   ## If you really need to change filenames with numbers,
+   newfname <- file.path(npath, paste0(x, basename(fname)))
+   ## otherwise just use `file.path(npath, basename(fname))`.
+   
+   ## (4) Write back to a different file location:
+   write.csv(newdat, file = newfname, row.names = FALSE)
+ }
Error in `$<-.data.frame`(`*tmp*`, "results", value = numeric(0)) : 
  replacement has 0 rows, data has 11

Any suggestions?

r2evans · Accepted Answer

There are several problems with your code (e.g., x in your function is never defined and is not retained between calls to addValue), so I'm guessing that this is a chopped-down version of the real code and you still have remnants remaining. Instead of picking it apart verbosely, I'll just offer my own suggested code and a few pointers.

The function addValue looks like it is good for changing a data.frame, but I would not have guessed (by the name, at least) that it would also write the file to disk (and potentially over-write an existing file).

I'm guessing you are trying to (1) read a file, (2) "add value" to it, (3) assign it to a global variable, and (4) write it to disk. The third can be problematic (and contentious with some programmers), but I'll leave it for now.

Unless writing to disk is inherent to your idea of "adding value" to a data.frame, I recommend you keep #2 separate from #4. Below is a suggested alternative to your code:

addValue <- function(valueToAdd, df) {
    df$results <- df$Value + valueToAdd
    ## more stuff here?
    return(df)
}

opath <- 'c:/Users/RFiles/files/raw'     # notice the difference
npath <- 'c:/Users/RFiles/files/adjusted'
files <- list.files(path = opath, pattern = '*.csv', full.names = TRUE)

x <- 0
for (fname in files) {
    x <- x + 1
    ## (1) read in and (2) "add value" to it
    dat <- read.csv(fname)
    newdat <- addValue(2, dat)

    ## (3) Conditionally assign to a global variable:
    varname <- gsub('\.[^.]*$', '', basename(fname))
    if (! exists(varname)) {
        assign(x = varname, value = newdat)
    } else {
        warning('variable exists, did not overwrite: ', varname)
    }
    ## If you really need to change filenames with numbers,
    newfname <- file.path(npath, paste0(x, basename(fname)))
    ## otherwise just use `file.path(npath, basename(fname))`.

    ## (4) Write back to a different file location:
    write.csv(newdat, file = newfname, row.names = FALSE)
}

Notice that it will not overwrite global variables. This may be an annoying check, but will keep you from losing data if you accidentally run this section of code.

An alternative to assigning numerous variables to the global address space is to save all of them to a single list. Assuming they are the same format, you will likely be dealing with them with identical (or very similar) analytical methods, so putting them all in one list will facilitate that. The alternative of tracking disparate variable names can be tiresome.

## addValue as defined previously
opath <- 'c:/Users/RFiles/files/raw'
npath <- 'c:/Users/RFiles/files/adjusted'
ofiles <- list.files(path = opath, pattern = '*.csv', full.names = TRUE)
nfiles <- file.path(npath, basename(ofiles))

dats <- mapply(function(ofname, nfname) {
    dat <- read.csv(ofname)
    newdat <- addValue(2, dat)
    write.csv(newdat, file = nfname, row.names = FALSE)
    newdat
}, ofiles, nfiles, SIMPLIFY = FALSE)
length(dats)                            # number of files
names(dats)                             # one for each file

Applying a function on all csv files from a certain folder

Answers (1)

Related Questions