Reputation: 67

Deleting variables for all files within a folder?

I have many files within a folder and wish to delete a number of columns for each and insert a new column. I am able to do this one file at a time with the following code:

df <- read.csv("C:\\Users\\name\\Documents\\CSV Files\\1\\30335\\file1.csv")
df <- df[-c(6:34)]
df$newcolumn <- df$column1-df$column2
write.table(df, file = "C:\\Users\\name\\Documents\\CSV Files\\1\\30335\\file1.csv",
sep = ",", dec = ".", col.names = T, row.names = F)

However, I wish to do it for all the files within the folder just running it once.

Thanks in advance for the help.

Upvotes: 1

Answers (2)

Gavin Simpson

Reputation: 174948

First, some dummy data to work with

for (i in seq_len(3)) {
  df <- data.frame(A = runif(10), B = runif(10), C = runif(10))
  fname <- paste0("./df", i, ".csv")
  write.csv(df, fname, row.names = FALSE)
}

Ok, first list the .csv files in the directory:

path <- "~/"
fs <- list.files(path, pattern = glob2rx("*.csv"))

which gives

R> fs
[1] "df1.csv" "df2.csv" "df3.csv"

Next, loop over the set of files

for (f in fs) {
  fname <- file.path(path, f)             ## current file name
  df <- read.csv(fname)                   ## read file
  df <- df[, -2]                          ## delete column B
  df$D <- df[, 1] + df[, 2]               ## add something
  write.csv(df, fname, row.names = FALSE) ## write it out
}

That's it, but just check it worked:

R> read.csv(file.path(path, fs[1]))
         A        C      D
1  0.71253 0.405461 1.1180
2  0.83507 0.353672 1.1887
3  0.61541 0.018851 0.6343
4  0.92108 0.006301 0.9274
5  0.07466 0.570673 0.6453
6  0.81803 0.160932 0.9790
7  0.50841 0.935930 1.4443
8  0.64912 0.965246 1.6144
9  0.31503 0.946411 1.2614
10 0.41563 0.212671 0.6283

The full script is:

path <- "~/"
fs <- list.files(path, pattern = glob2rx("*.csv"))
for (f in fs) {
  fname <- file.path(path, f)             ## current file name
  df <- read.csv(fname)                   ## read file
  df <- df[, -2]                          ## delete column B
  df$D <- df[, 1] + df[, 2]               ## add something
  write.csv(df, fname, row.names = FALSE) ## write it out
}

The glob2rx() call converts the file pattern glob into a regular expression so that only files with the .csv extension are selected. If you knew regular expressions you could write that yourself, but glob2rx() is a nice shortcut for those who don't speak regexps.

Essentially, the solution above and that of Sven's answer are very similar. I prefer the loop approach here as creating an anonymous function, although not at all difficult, is one step removed from the actual problem, which is to just do a sequence of steps one after the other and to my mind, that is most clearly demonstrated via a loop. But that is purely personal preference.

For your specific example, not tested as I don't have your setup, you would need:

path <- "C:\\Users\\name\\Documents\\CSV Files\\1\\30335"
fs <- list.files(path, pattern = glob2rx("*.csv"))
for (f in fs) {
  fname <- file.path(path, f)               ## current file name
  df <- read.csv(fname)                     ## read file
  df <- df[, -(6:34)]                       ## delete columns
  df$D <- df[, "column1"] + df[, "column2"] ## add new column
  write.csv(df, fname, row.names = FALSE)   ## write it out
}

Upvotes: 4

Sven Hohenstein

Reputation: 81743

Here's one approach:

path <- "C:\\Users\\name\\Documents\\CSV Files\\1\\30335"

files <- list.files(path = path)

lapply(files, function(file) {
  fp <- file.path(path, file)
  df <- read.csv(fp)[-6:34]
  df$newcolumn <- df$column1 - df$column2
  write.table(df, file = fp,
              sep = ",", dec = ".", col.names = TRUE, row.names = FALSE)
})

Upvotes: 2

Deleting variables for all files within a folder?

Answers (2)

Related Questions