Reputation: 11
I am very new to R so this may be a silly question. Please bear with…
We have assessed participants' attention in our study. Each participant completed 365 trials in one of two conditions; we noted responses, accuracy, etc. Now, the first row of each column represents the headers for the above:
participant_id trial condition accuracy etc.
101 1 0 1 ...
101 2 0 1 ...
101 3 0 0 ...
102 1 3 1 ...
102 2 3 0 ...
I want to calculate the overall average accuracy for the first versus the last 120 trials. Note: of the 365 trails, the first five are for practise of the task only. Thus, I am looking to get the descriptives (mean, standard deviation etc.) for the overall accuracy on trials 6-125 (first 120) and 246-365 (last 120).
I have tried using the subset()
command to split my data up, but am not sure it's the appropriate function. Also uncertain about the best way to then calculate my means.
#split data.sub into first and last 120 trials
data.sub120=subset(data.sub, data.sub$trial== 6:125)
data.sub120last=subset (data.sub, data.sub$trial== 246:365)
stat.desc (data.sub120,data.sub120last)
Any help would be appreciated - sorry if I'm wasting anyone's time, still learning!
Thanks!
Upvotes: 1
Views: 410
Reputation: 7714
Here is another solution, in line with Brandson's using the data.table
package. It's faster than plyr, but i find the syntax for aggregation problems more intuitive. Here is the documentation for further refference.
demo.data <- data.frame(participant.id = c(rep(101, 365), rep(102, 365), rep(103, 365)),
trial = c(1:365, 1:365, 1:365),
condition = letters[1:5],
accuracy = rbinom(365*3, 1, 0.5))
require("data.table")
DT <- data.table(demo.data)
DT$fc_trial <- cut(DT$trial, breaks = c(0, 5, 126, 246, 365),
labels = c("Practice","First120","Middle","Last120"))
result <- DT[,j=list(mean_accuracy = mean(accuracy),
sd_accuracy = sd(accuracy)
)
, by = fc_trial]
print(result)
# fc_trial mean_accuracy sd_accuracy
# 1: Practice 0.6000000 0.5070926
# 2: First120 0.5151515 0.5004602
# 3: Middle 0.5833333 0.4936928
# 4: Last120 0.4677871 0.4996615
Upvotes: 1
Reputation: 44648
I find it good practice to create a variable that describes the subset and store it with my data for future use. You'll thank yourself later for being able to reproduce large parts of your analysis (bonus points to yourself for naming variables in a manner that has intrinsic meaning to you)
First, let's create a basic factor based on your criteria and append it to your dataset:
mydata$trialsplit <- cut(mydata$trial,c(0,5,126,246,365),
labels=c("Practice","First120","Middle","Last120")
I'm also a fan of the plyr
package so I would use this in a manner similar to Maiasaura. If you just need a summary table, you can do the following:
library(ddply)
ddply(mydata, .(trialsplit), summarize,
mean_condition = mean(condition),
sd_condition = sd(condition),
mean_accuracy = mean(accuracy),
sd_accuracy = sd(accuracy)
)
If you'd like to append the information to your data instead of generating a summary you change the word "summarize" to "transform".
Stat testing your data after saving the cut variable now becomes quite easy as well:
# Does accuracy change from the first 120 to the last 120 trials?
t.test(mydata$accuracy[mydata$trialsplit == "First120"],
mydata$accuracy[mydata$trialsplit == "Last120"])
Upvotes: 1
Reputation: 11597
You can subset with inequalities:
## creating data for demonstration purposes
demo.data <- data.frame(participant.id = c(rep(101, 365), rep(102, 365), rep(103, 365)),
trial = c(1:365, 1:365, 1:365),
accuracy = rbinom(365*3, 1, 0.5))
## getting the first 120 trials
data.sub120 <- demo.data[demo.data$trial>5 & demo.data$trial<126,]
##getting the last 120 trials
data.sub120last <- demo.data[demo.data$trial>245 & demo.data$trial<366,]
##taking the means
mean(data.sub120$accuracy)
mean(data.sub120last$accuracy)
Upvotes: 1
Reputation: 32986
library(plyr)
# ddply takes a data.frame, splits by a variable, applies a fn,
# and returns everything back to a data.frame
results <- ddply(data.sub, .(participant_id), function(x) {
# order the data by trial number
x <- arrange(x, trial)
# Take rows 6-25, and only columns 3 and 4
# since they are the only numeric ones in your example above,
# and apply the mean function to each column
# turn that into a data.frame
result <- data.frame(t(apply(x[6:125, c(3,4)], 2, mean)))
# add the participant ID
result$participant_id <- unique(x$participant_id)
result
})
Upvotes: 1