Vanessa S.
Vanessa S.

Reputation: 133

Create counter in dataframe that gets reset based on changes in value or new ID

I have a df with an ID column and another column coding 1 or 0 depending on which response the person made. I'd like to create a third, new column that contains a counter that gets reset to 1 and starts counting again each time a person changes the response. Also, the counter should be reset whenever the ID changes. Here is a reproducible df & an example of how the output should look like:

df <- data.frame(ID = c(rep("abc546", 7), rep("azg467", 7), 
rep("c7hg6", 7)), 
response=c(1,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0))             
df

Desired output:

ID   response counter
1  abc546   1       1
2  abc546   1       2
3  abc546   1       3
4  abc546   0       1
5  abc546   0       2
6  abc546   1       1
7  abc546   0       1
8  azg467   0       1
9  azg467   1       1 
10 azg467   0       1
11 azg467   0       2
12 azg467   0       3
13 azg467   1       1
14 azg467   0       1
15  c7hg6   1       1
16  c7hg6   0       1
17  c7hg6   0       2
18  c7hg6   0       3
19  c7hg6   1       1
20  c7hg6   1       2
21  c7hg6   0       1

I tried adapting a function from another question I posted a while back ( Counting unequal elements in-between equal elements in R df column ), that also required consideration and resetting the counter based on a persons's id, but this considered a different rule and so far I failed to adapt it in a way that would provide the correct output (attempt 1). Attempt 1 will reset the counter back to 1 when the response changes, but on the next trial, the counter will continue with the overall count.

I also read several SO postings such as this one: Numbering rows within groups in a data frame and tried several things based on these posts (e.g. attempt 2 see below, which resets the counter and overall gives the output I want but does not consider the ID variable and hence does not reset the counter once a new ID starts). I also tried to combine attempt 2 with a groupby function and other things but I never get the output I need.

Attempt 1:

 my.function <- function(x) {
    ret <- rep(1, length(x))   # start counter at 1
    for (i in 2:length(x)) { 
      for (j in (i-1):1) {   
        if (x[j] == x[i]) {
          ret[i] = i  
        } 
        else {
          ret[i] = 0 
          }
          break
        }
      }
    return(ret)
  }

df %>%group_by(ID) %>%mutate(counter = my.function(response)) 

Attempt 2:

sequence(rle(as.character(df$response))$lengths)

I hope you can help with this problem, as I'm out of ideas of what else to try. In case I should have overlooked an SO entry that solves my problem, it would be great if you could point me there. Many thanks!!!

Upvotes: 0

Views: 1912

Answers (2)

A. Suliman
A. Suliman

Reputation: 13125

Another option using rle

library(dplyr)
df %>% group_by(ID) %>% mutate(counter = unlist(sapply(rle(response)$lengths, seq_len)))

# A tibble: 21 x 3
# Groups:   ID [3]
      ID     response counter
     <fct>     <dbl>   <int>
  1 abc546        1       1
  2 abc546        1       2
  3 abc546        1       3
  4 abc546        0       1
  5 abc546        0       2
  6 abc546        1       1
  7 abc546        0       1
  8 azg467        0       1
  9 azg467        1       1
  10 azg467        0       1
  # ... with 11 more rows

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Using dplyr, we can create a grouping variable using cumsum and lag and assign row_number as counter for every group.

library(dplyr)

df %>%
   group_by(ID, group = cumsum(response != lag(response, default = first(response)))) %>%
   mutate(counter = row_number()) %>%
   ungroup() %>%
   select(-group)

#   ID     response counter
#   <fct>     <dbl>   <int>
# 1 abc546        1       1
# 2 abc546        1       2
# 3 abc546        1       3
# 4 abc546        0       1
# 5 abc546        0       2
# 6 abc546        1       1
# 7 abc546        0       1
# 8 azg467        0       1
# 9 azg467        1       1
#10 azg467        0       1
# … with 11 more rows

In data.table, it is a bit simpler since we can use rleid

library(data.table)
setDT(df)[, counter := seq_len(.N), by = .(ID, rleid(response))]

In base R, we can use ave

df$counter <- with(df, ave(response, ID, 
   with(rle(response), rep(seq_along(values), lengths)), FUN = seq_along))

Upvotes: 2

Related Questions