griffmer
griffmer

Reputation: 377

Count repetitions of a variable length sequence across data subsets

Answers or points in the right direction are appreciated.

-I have a dataset that is organized by group (id) -There is a column (trial) that indicates the trial the data corresponds to. This value is repeated from 1 to some number. Each trial value can be repeated a variable length (e.g., 1122234444). -Sequences through the trial values are repeated within group. E.g., within each id - you go through a sequence of trial and then trial restarts at 1 and goes through the sequence again for some number of times.

I need to know how many times the trial sequence has been repeated within each group of id.

The desired output is the variable "repetition".

The "repetition" variable should start at 1 and repeat until the sequence restarts again to 1, where it should move to 2 to indicate that the trial sequence is on it's 2nd repeat.

The max number of trials, ids, and the number of repetitions are always variable, but the trial sequence always goes (repeating at variable length) 1,2,3,....

id <- sort(rep(c("a", "b"), each = 4, times = 2))
trial <- rep(1:2, each = 2 , times = 2)
repetition <- rep(1:2, each = 4, times = 2)

df <- data.frame(id, trial, repetition)

   id trial repetition
1   a     1          1
2   a     1          1
3   a     2          1
4   a     2          1
5   a     1          2
6   a     1          2
7   a     2          2
8   a     2          2
9   b     1          1
10  b     1          1
11  b     2          1
12  b     2          1
13  b     1          2
14  b     1          2
15  b     2          2
16  b     2          2

Upvotes: 2

Views: 372

Answers (2)

Sotos
Sotos

Reputation: 51592

Here is an idea using dplyr together with splitstackshape. We first use new = cumsum(c(1, diff(trial) != 0)) to get the number of different groups. We then group by id, new and count them (new1). We slice to get the top of each group and use cumsum(trial == 1) to get the repetition. Finally, we use splitstackshape function expandRows which replicates the rows by the count number we obtained from new1. We finish by tidying a bit with select and ungroup.

library(dplyr)
library(splitstackshape)

df %>% 
  mutate(new = cumsum(c(1, diff(trial) != 0))) %>% 
  group_by(id, new) %>% 
  mutate(new1 = n()) %>% 
  slice(1L) %>% 
  group_by(id) %>% 
  mutate(repetition = cumsum(trial == 1)) %>% 
  expandRows('new1') %>% 
  select(-new) %>% 
  ungroup()
# A tibble: 16 × 3
#       id trial repetition
#   <fctr> <int>      <int>
#1       a     1          1
#2       a     1          1
#3       a     2          1
#4       a     2          1
#5       a     1          2
#6       a     1          2
#7       a     2          2
#8       a     2          2
#9       b     1          1
#10      b     1          1
#11      b     2          1
#12      b     2          1
#13      b     1          2
#14      b     1          2
#15      b     2          2
#16      b     2          2

Upvotes: 1

Alias
Alias

Reputation: 149

I assumed your data looks something like this:

trial=rep(c(1,1,2,2,2,3,4,4,4,4,1,2,2,2,2,2,3,3,3,4,5,5,5,6,6,7,1,1,2,3,3,4,5,6,7,7,7),2)
id=c(rep("a",length(trial/2)),rep("b",length(trial/2)))
df=data.frame(id,trial,repetition=numeric(length(trial)))

Then this code does what you are asking for as far as I understood:

counter=1
for(i in 1:nrow(df)){

  if(i>1){
    if(df$id[i-1] != df$id[i]){
      counter=1
    } else {

      if(df$trial[i-1]>df$trial[i]){
        counter=counter+1
      }

    }  
    df$repetition[i]=counter
  }else{
    df$repetition[i]=1
  }
}

In my data-frame the repetition-column already exists but this also works if the data-frame df doesn't have the repetition-column yet. It will be added by the code in the loop if it doesn't exist yet.

Upvotes: 1

Related Questions