user2205323
user2205323

Reputation: 83

Calculate number of changes of a variable per individual in a data frame

Might be a very simple question to ask but I struggle to solve this problem in r. I have a dataset containing four variables: ID (for identifying the participants ), Type (with 1 value this time ), Decision (A or B) and Feedback (0 or 1). The data set for two participants looks like this:

ID   Type    Decision    Feedback
1     1       A           0
1     1       A           0
1     1       B           1
1     1       B           1
1     1       B           0
2     1       A           0
2     1       A           1
2     1       A           1
2     1       A           0
2     1       B           1
etc...

I want to calculate the number of changes in the decision process as a function of the previous feedback. In other words, if the participant choose A and received a negative feedback, will she/he choose A again (Stay) or B (Shift). So my code is the following for one participant:

Stay=0
Shift=0  

for(i in 2:length(mydf$Type)){  
    if(mydf$Decision[i] == "A" && mydf$Feedback[i-1]==1 && mydf$Decision [i-1] == "A" ){
    Stay= Stay+1
    }
    else if(mydf$Decision [i] == "B" && mydf$Feedback[i-1]==1 && mydf$Decision [i-1] == "B" ){
    Stay= Stay+1
    }
    else if(mydf$ Decision [i] == "A" && mydf$Feedback[i-1]==1 && mydf$Decision [i-1] == "B" ){
    Shift= Shift+1
    }
    else if(mydf$Decision [i] == "B" && mydf$Feedback[i-1]==1 && mydf$Decision [i-1] == "A" ){
    Shift= Shift+1
    }
}

However, my data frame contains 20 participants and I don’t know how to extend my code to get the number of stays and shifts for each participant (i.e., to get something like this at the end):

#ID    Stay    Shift
#1     10      10
#2     16      4
#etc...

Thank you very much for your help in advance.

Upvotes: 3

Views: 197

Answers (3)

Matthew Plourde
Matthew Plourde

Reputation: 44614

This is a slightly hairier alternative using the embed function, as mentioned in the comments to @DavidRobinson's answer.

d<-read.table(text="ID   Type    Decision    Feedback
1     1       A           0
1     1       A           0
1     1       B           1
1     1       B           1
1     1       B           0
2     1       A           0
2     1       A           1
2     1       A           1
2     1       A           0
2     1       B           1", header=TRUE)

do.call(rbind,
    by(d, d$ID, function(x) {
        f <- function(x) length(unique(x)) == 1
        stay <- apply(embed(as.vector(x$Decision), 2), 1, f)
        neg.feedback <- x$Feedback[1:nrow(x)-1] == 1
        c(Stay = sum(stay & neg.feedback), Shift = sum((! stay) & neg.feedback))
    })
)
#   Stay  Shift
# 1    2      0
# 2    2      0

Upvotes: 1

Ricardo Saporta
Ricardo Saporta

Reputation: 55350

How about a nice breakdown by ID and Feedback:

  library(data.table)
  X <- data.table(mydf, key="ID")

  X[, list(Dif=abs(diff(as.numeric(Decision))),  
          FB=head(Feedback, -1))
        , by=ID][,list(Shifted=sum(Dif), Stayed=length(Dif)-sum(Dif)), by=list(ID,FB)]

  #     ID FB Shifted Stayed
  #  1:  1  0       1      1
  #  2:  1  1       0      2
  #  3:  2  0       1      1
  #  4:  2  1       0      2

or if you don't want the breakdown by Feedback, it is even more succinct:

X[ , {Dif=abs(diff(as.numeric(Decision))); 
     list(Shifted=sum(Dif), Stayed=length(Dif)-sum(Dif))}
  , by=list(ID)]

#      ID Shifted Stayed
# 1:  1       1      3
# 2:  2       1      3

Upvotes: 1

David Robinson
David Robinson

Reputation: 78600

This is best done using ddply in the plyr package (you'll have to install it), which splits up a data frame based on one of the columns and does some analysis on each, before recombining into a new data frame.

First, write a function num.stay.shift that calculates your stay and shift values given a single subset of the data frame (explained in comments):

num.stay.shift = function(d) {
    # vector of TRUE or FALSE for whether d$Feedback is 1
    negative.feedback = (head(d$Feedback, -1) == 1)
    # vector of TRUE or FALSE for whether there is a change at each point
    stay = head(d$Decision, -1) == tail(d$Decision, -1)
    # summarize as two values: the number that stayed when feedback == 1,
    # and the number that shifted when feedback == 1
    c(Stay=sum(stay[negative.feedback]), Shift=sum(!stay[negative.feedback]))
}

Then, use ddply to apply that function to each of the individuals within the data frame, splitting it up by ID:

print(ddply(tab, "ID", num.stay.shift))

On the subset of the data frame you show, you would end up with

#   ID Stay Shift
# 1  1    2     0
# 2  2    2     0

Upvotes: 3

Related Questions