ddply not returning values from function split by variable

Question

I'm using the ddply function (plyr) to calculate something separately by participant id (pid). However, for some reason it's not returning separate values by pid, but rather the same value across all pid.

Sample data:

sdt<-c("Hit","Hit","Miss","Miss","False Alarm","Correct Reject","Correct Reject","Correct Reject",
   "Hit","Hit","Hit","Miss","False Alarm","False Alarm","False ALarm","Correct Reject")

pid<-c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)

adhd_p<-data.frame(sdt,pid)

Function:

ddply(adhd_p, "pid", summarise,
  hitrate=(count(adhd_p$sdt=="Hit")[[2,2]])/((count(adhd_perf$sdt=="Hit")[[2,2]])+(count(adhd_p$sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(adhd_p$sdt=="False Alarm")[[2,2]])/((count(adhd_p$sdt=="False Alarm")[[2,2]])+(count(adhd_p$sdt=="Correct Reject")[[2,2]])))

If it helps to understand what I'm calculating... Participants can either "Hit" (respond affirmatively to target), "Miss" (do not respond to target), "Correct Reject" (do not respond to distractor), or "False Alarm" (respond affirmatively to distractor). Thus, "hitrate" is number of hits/hits+misses, and "falsealarmrate" is number of false alarms/false alarms+correct reject.

What am I doing wrong?

Thanks for your time.

Edit: Above problem solved very quickly by editing code to

 ddply(adhd_p, "pid", summarise,
  hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(adhd_p$sdt=="Correct Reject")[[2,2]])))

I realize now that I need to split over two variables rather than just one. However adding a time variable:

time<-c(1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8)

And merging it in with the others

adhd_p<-data.frame(sdt,pid,time)

Makes the new script produce a "subscript out of bounds" error.

ddply(adhd_p, .(pid,time), summarise,
  hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]])))

Any thoughts?

Joe · Accepted Answer

What you need to be doing:

ddply(adhd_p, "pid", summarise,
  hitrate=(count(sdt=="Hit")[[2,2]])/((count(sdt=="Hit")[[2,2]])+(count(sdt=="Miss")[[2,2]])),
  falsealarmrate=(count(sdt=="False Alarm")[[2,2]])/((count(sdt=="False Alarm")[[2,2]])+(count(sdt=="Correct Reject")[[2,2]])))

Why you need to be doing it:

When you call ddply, the function works within the .data (adhd_p in your case) as the local namespace. This is similar to calling attach(adhd_p); calling the name of a column without referencing the dataframe explicitly still calls the correct column.

When you supply the summarise argument, the function splits up vectors in the local namespace based on the the id columns supplied (in this case, pid). So, if you reference columns without referencing the dataframe explicitly as above, calculations will be done with the portion of the sdt column corresponding to each pid. However, if you reference the column and dataframe explictly (adhd_p$sdt in your case), it just pulls in the entire vector from the global namespace and doesn't split it appropriately.

Edit: the code below is both less messy and won't raise an error if one of the values is missing:

ddply(adhd_p, .(pid, time), summarise,
      hitrate=(sum(sdt=="Hit"))/(sum(sdt=="Hit"))+(sum(sdt=="Miss")),
      falsealarmrate=(sum(sdt=="False Alarm"))/(sum(sdt=="False Alarm"))+(sum(sdt=="Correct Reject")))

ddply not returning values from function split by variable

Answers (2)

Related Questions