Reputation: 67
I have many files within a folder and wish to delete a number of columns for each and insert a new column. I am able to do this one file at a time with the following code:
df <- read.csv("C:\\Users\\name\\Documents\\CSV Files\\1\\30335\\file1.csv")
df <- df[-c(6:34)]
df$newcolumn <- df$column1-df$column2
write.table(df, file = "C:\\Users\\name\\Documents\\CSV Files\\1\\30335\\file1.csv",
sep = ",", dec = ".", col.names = T, row.names = F)
However, I wish to do it for all the files within the folder just running it once.
Thanks in advance for the help.
Upvotes: 1
Views: 537
Reputation: 174948
First, some dummy data to work with
for (i in seq_len(3)) {
df <- data.frame(A = runif(10), B = runif(10), C = runif(10))
fname <- paste0("./df", i, ".csv")
write.csv(df, fname, row.names = FALSE)
}
Ok, first list the .csv
files in the directory:
path <- "~/"
fs <- list.files(path, pattern = glob2rx("*.csv"))
which gives
R> fs
[1] "df1.csv" "df2.csv" "df3.csv"
Next, loop over the set of files
for (f in fs) {
fname <- file.path(path, f) ## current file name
df <- read.csv(fname) ## read file
df <- df[, -2] ## delete column B
df$D <- df[, 1] + df[, 2] ## add something
write.csv(df, fname, row.names = FALSE) ## write it out
}
That's it, but just check it worked:
R> read.csv(file.path(path, fs[1]))
A C D
1 0.71253 0.405461 1.1180
2 0.83507 0.353672 1.1887
3 0.61541 0.018851 0.6343
4 0.92108 0.006301 0.9274
5 0.07466 0.570673 0.6453
6 0.81803 0.160932 0.9790
7 0.50841 0.935930 1.4443
8 0.64912 0.965246 1.6144
9 0.31503 0.946411 1.2614
10 0.41563 0.212671 0.6283
The full script is:
path <- "~/"
fs <- list.files(path, pattern = glob2rx("*.csv"))
for (f in fs) {
fname <- file.path(path, f) ## current file name
df <- read.csv(fname) ## read file
df <- df[, -2] ## delete column B
df$D <- df[, 1] + df[, 2] ## add something
write.csv(df, fname, row.names = FALSE) ## write it out
}
The glob2rx()
call converts the file pattern glob into a regular expression so that only files with the .csv
extension are selected. If you knew regular expressions you could write that yourself, but glob2rx()
is a nice shortcut for those who don't speak regexps.
Essentially, the solution above and that of Sven's answer are very similar. I prefer the loop approach here as creating an anonymous function, although not at all difficult, is one step removed from the actual problem, which is to just do a sequence of steps one after the other and to my mind, that is most clearly demonstrated via a loop. But that is purely personal preference.
For your specific example, not tested as I don't have your setup, you would need:
path <- "C:\\Users\\name\\Documents\\CSV Files\\1\\30335"
fs <- list.files(path, pattern = glob2rx("*.csv"))
for (f in fs) {
fname <- file.path(path, f) ## current file name
df <- read.csv(fname) ## read file
df <- df[, -(6:34)] ## delete columns
df$D <- df[, "column1"] + df[, "column2"] ## add new column
write.csv(df, fname, row.names = FALSE) ## write it out
}
Upvotes: 4
Reputation: 81743
Here's one approach:
path <- "C:\\Users\\name\\Documents\\CSV Files\\1\\30335"
files <- list.files(path = path)
lapply(files, function(file) {
fp <- file.path(path, file)
df <- read.csv(fp)[-6:34]
df$newcolumn <- df$column1 - df$column2
write.table(df, file = fp,
sep = ",", dec = ".", col.names = TRUE, row.names = FALSE)
})
Upvotes: 2