Reputation: 133
I have a df with an ID column and another column coding 1 or 0 depending on which response the person made. I'd like to create a third, new column that contains a counter that gets reset to 1 and starts counting again each time a person changes the response. Also, the counter should be reset whenever the ID changes. Here is a reproducible df & an example of how the output should look like:
df <- data.frame(ID = c(rep("abc546", 7), rep("azg467", 7),
rep("c7hg6", 7)),
response=c(1,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0))
df
Desired output:
ID response counter
1 abc546 1 1
2 abc546 1 2
3 abc546 1 3
4 abc546 0 1
5 abc546 0 2
6 abc546 1 1
7 abc546 0 1
8 azg467 0 1
9 azg467 1 1
10 azg467 0 1
11 azg467 0 2
12 azg467 0 3
13 azg467 1 1
14 azg467 0 1
15 c7hg6 1 1
16 c7hg6 0 1
17 c7hg6 0 2
18 c7hg6 0 3
19 c7hg6 1 1
20 c7hg6 1 2
21 c7hg6 0 1
I tried adapting a function from another question I posted a while back ( Counting unequal elements in-between equal elements in R df column ), that also required consideration and resetting the counter based on a persons's id, but this considered a different rule and so far I failed to adapt it in a way that would provide the correct output (attempt 1). Attempt 1 will reset the counter back to 1 when the response changes, but on the next trial, the counter will continue with the overall count.
I also read several SO postings such as this one: Numbering rows within groups in a data frame and tried several things based on these posts (e.g. attempt 2 see below, which resets the counter and overall gives the output I want but does not consider the ID variable and hence does not reset the counter once a new ID starts). I also tried to combine attempt 2 with a groupby function and other things but I never get the output I need.
Attempt 1:
my.function <- function(x) {
ret <- rep(1, length(x)) # start counter at 1
for (i in 2:length(x)) {
for (j in (i-1):1) {
if (x[j] == x[i]) {
ret[i] = i
}
else {
ret[i] = 0
}
break
}
}
return(ret)
}
df %>%group_by(ID) %>%mutate(counter = my.function(response))
Attempt 2:
sequence(rle(as.character(df$response))$lengths)
I hope you can help with this problem, as I'm out of ideas of what else to try. In case I should have overlooked an SO entry that solves my problem, it would be great if you could point me there. Many thanks!!!
Upvotes: 0
Views: 1912
Reputation: 13125
Another option using rle
library(dplyr)
df %>% group_by(ID) %>% mutate(counter = unlist(sapply(rle(response)$lengths, seq_len)))
# A tibble: 21 x 3
# Groups: ID [3]
ID response counter
<fct> <dbl> <int>
1 abc546 1 1
2 abc546 1 2
3 abc546 1 3
4 abc546 0 1
5 abc546 0 2
6 abc546 1 1
7 abc546 0 1
8 azg467 0 1
9 azg467 1 1
10 azg467 0 1
# ... with 11 more rows
Upvotes: 1
Reputation: 388817
Using dplyr
, we can create a grouping variable using cumsum
and lag
and assign row_number
as counter
for every group.
library(dplyr)
df %>%
group_by(ID, group = cumsum(response != lag(response, default = first(response)))) %>%
mutate(counter = row_number()) %>%
ungroup() %>%
select(-group)
# ID response counter
# <fct> <dbl> <int>
# 1 abc546 1 1
# 2 abc546 1 2
# 3 abc546 1 3
# 4 abc546 0 1
# 5 abc546 0 2
# 6 abc546 1 1
# 7 abc546 0 1
# 8 azg467 0 1
# 9 azg467 1 1
#10 azg467 0 1
# … with 11 more rows
In data.table
, it is a bit simpler since we can use rleid
library(data.table)
setDT(df)[, counter := seq_len(.N), by = .(ID, rleid(response))]
In base R, we can use ave
df$counter <- with(df, ave(response, ID,
with(rle(response), rep(seq_along(values), lengths)), FUN = seq_along))
Upvotes: 2