Reputation: 81
I am struggling to iterate 2 loops over all the files in a folder. I have over 600 .csv files, which contain information about the latency and duration of saccades made in a sentence. They look like this:
order subject sentence latency duration
1 1 1 641 76
2 1 1 98 57
3 1 1 252 49
4 1 1 229 43
For each of the files, I want to create 2 new columns called Start and End, to calculate the start and end point of each saccade. The values in each of those are calculated from the values in the latency and duration columns. I can do this using a loop for each file, like so:
SentFile = read.csv(file.choose(), header = TRUE, sep = ",")
# Calculate Start
for (i in 1:(nrow(SentFile)-1)){
SentFile$Start[1] = SentFile$Latency[1]
SentFile$Start[i+1] = SentFile$Start[i] +
SentFile$Duration[i] + SentFile$Latency[i+1]}
#Calculate End
for (i in 1:(nrow(SentFile)-1)){
SentFile$End[i] = SentFile$Start[i] + SentFile$Duration[i]}
And then the result looks like this:
order subject sentence latency duration Start End
1 1 1 641 76 641 717
2 1 1 98 57 815 872
3 1 1 252 49 1124 1173
4 1 1 229 43 1402 1445
I am sure there is probably a more efficient way of doing it, but it is very important to use the precise cells specified in the loop to calculate the Start and End values and that was the only way I could think of to get it to work for each individual file.
As I said, I have over 600 files, and I want to be able to calculate the Start and End values for the entire set and add the new columns to each file. I tried using lapply, like this:
sent_files = list.files()
lapply(sent_files, function(x){
SentFile = read.csv(x, header = TRUE, sep = ",")
for (i in 1:(nrow(SentFile)-1)){
SentFile$Start[1] = SentFile$Latency[1]
SentFile$Start[i+1] = SentFile$Start[i] + SentFile$Duration[i]
+ SentFile$Latency[i+1]}
#Calculate End of Saccade Absolute Time Stamp #######
for (i in 1:(nrow(SentFile)-1)){
SentFile$End[i] = SentFile$Start[i] + SentFile$Duration[i]}})
However, I keep getting this error message:
Error in `$<-.data.frame`(`*tmp*`, "SacStart", value = c(2934L, NA)):replacement has 2 rows, data has 1
I would really appreciate any help in getting this to work!
Upvotes: 1
Views: 90
Reputation: 6969
First, replace for
loops:
data <- data.frame(
"order" = c(1,2,3,4), subject = c(1,1,1,1), sentance = c(1,1,1,1), latency= c(641, 98, 252, 229), duration = c(76, 57, 49, 43)
)
data$end <- cumsum(data$latency + data$duration)
data$start <- data$end - data$duration
Secondly, you are not assigning results of the CSV load to your environment variable.
If you want to process all files in one go, change the code for data load to this:
data.list <- lapply(sent_files, function(x){
data <- read.csv(x, header = TRUE, sep = ",")
return(data)
})
data <- do.call("rbind", data.list)
Upvotes: 1