Count repetitions of a variable length sequence across data subsets

Question

Answers or points in the right direction are appreciated.

-I have a dataset that is organized by group (id) -There is a column (trial) that indicates the trial the data corresponds to. This value is repeated from 1 to some number. Each trial value can be repeated a variable length (e.g., 1122234444). -Sequences through the trial values are repeated within group. E.g., within each id - you go through a sequence of trial and then trial restarts at 1 and goes through the sequence again for some number of times.

I need to know how many times the trial sequence has been repeated within each group of id.

The desired output is the variable "repetition".

The "repetition" variable should start at 1 and repeat until the sequence restarts again to 1, where it should move to 2 to indicate that the trial sequence is on it's 2nd repeat.

The max number of trials, ids, and the number of repetitions are always variable, but the trial sequence always goes (repeating at variable length) 1,2,3,....

id <- sort(rep(c("a", "b"), each = 4, times = 2))
trial <- rep(1:2, each = 2 , times = 2)
repetition <- rep(1:2, each = 4, times = 2)

df <- data.frame(id, trial, repetition)

   id trial repetition
1   a     1          1
2   a     1          1
3   a     2          1
4   a     2          1
5   a     1          2
6   a     1          2
7   a     2          2
8   a     2          2
9   b     1          1
10  b     1          1
11  b     2          1
12  b     2          1
13  b     1          2
14  b     1          2
15  b     2          2
16  b     2          2

Alias · Accepted Answer

I assumed your data looks something like this:

trial=rep(c(1,1,2,2,2,3,4,4,4,4,1,2,2,2,2,2,3,3,3,4,5,5,5,6,6,7,1,1,2,3,3,4,5,6,7,7,7),2)
id=c(rep("a",length(trial/2)),rep("b",length(trial/2)))
df=data.frame(id,trial,repetition=numeric(length(trial)))

Then this code does what you are asking for as far as I understood:

counter=1
for(i in 1:nrow(df)){

  if(i>1){
    if(df$id[i-1] != df$id[i]){
      counter=1
    } else {

      if(df$trial[i-1]>df$trial[i]){
        counter=counter+1
      }

    }  
    df$repetition[i]=counter
  }else{
    df$repetition[i]=1
  }
}

In my data-frame the repetition-column already exists but this also works if the data-frame df doesn't have the repetition-column yet. It will be added by the code in the loop if it doesn't exist yet.

Count repetitions of a variable length sequence across data subsets

Answers (2)

Related Questions