user2711113
user2711113

Reputation: 81

R How to iterate loops over every file in a folder?

I am struggling to iterate 2 loops over all the files in a folder. I have over 600 .csv files, which contain information about the latency and duration of saccades made in a sentence. They look like this:

order  subject  sentence  latency  duration 
1       1        1         641      76 
2       1        1         98       57
3       1        1         252      49
4       1        1         229      43

For each of the files, I want to create 2 new columns called Start and End, to calculate the start and end point of each saccade. The values in each of those are calculated from the values in the latency and duration columns. I can do this using a loop for each file, like so:

SentFile = read.csv(file.choose(), header = TRUE, sep = ",")

# Calculate Start
for (i in 1:(nrow(SentFile)-1)){
    SentFile$Start[1] = SentFile$Latency[1]
    SentFile$Start[i+1] = SentFile$Start[i] + 
    SentFile$Duration[i] + SentFile$Latency[i+1]}   

 #Calculate End 
 for (i in 1:(nrow(SentFile)-1)){
     SentFile$End[i] = SentFile$Start[i] + SentFile$Duration[i]}

And then the result looks like this:

order  subject  sentence  latency  duration  Start  End 
1       1        1         641      76        641   717 
2       1        1         98       57        815   872
3       1        1         252      49        1124  1173
4       1        1         229      43        1402  1445

I am sure there is probably a more efficient way of doing it, but it is very important to use the precise cells specified in the loop to calculate the Start and End values and that was the only way I could think of to get it to work for each individual file.

As I said, I have over 600 files, and I want to be able to calculate the Start and End values for the entire set and add the new columns to each file. I tried using lapply, like this:

sent_files = list.files()
lapply(sent_files, function(x){
SentFile = read.csv(x, header = TRUE, sep = ",")

for (i in 1:(nrow(SentFile)-1)){
    SentFile$Start[1] = SentFile$Latency[1]
    SentFile$Start[i+1] = SentFile$Start[i] + SentFile$Duration[i] 
    + SentFile$Latency[i+1]}   

    #Calculate End of Saccade Absolute Time Stamp #######
for (i in 1:(nrow(SentFile)-1)){
    SentFile$End[i] = SentFile$Start[i] + SentFile$Duration[i]}})

However, I keep getting this error message:

Error in `$<-.data.frame`(`*tmp*`, "SacStart", value = c(2934L, NA)):replacement has 2 rows, data has 1 

I would really appreciate any help in getting this to work!

Upvotes: 1

Views: 90

Answers (1)

Bulat
Bulat

Reputation: 6969

First, replace for loops:

data <- data.frame(
  "order" = c(1,2,3,4), subject = c(1,1,1,1), sentance = c(1,1,1,1), latency= c(641, 98, 252, 229), duration = c(76, 57, 49, 43)
)

data$end <- cumsum(data$latency + data$duration)
data$start <- data$end - data$duration

Secondly, you are not assigning results of the CSV load to your environment variable.

If you want to process all files in one go, change the code for data load to this:

data.list <- lapply(sent_files, function(x){
 data <- read.csv(x, header = TRUE, sep = ",")
 return(data)
})
data <- do.call("rbind", data.list)

Upvotes: 1

Related Questions