biostatguy12
biostatguy12

Reputation: 659

Create a count consecutive variable based on the value in another column

Hi I have something like:

df<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2,3,3,3,4,4,4), 
               number=c(0,1,1,1,1,1,0,1,0,1,0,1,1))

I want my 'want' to count the times 'number'=1 repeats per 'group'. Then I want the max number of times it repeats (want2). I want the 'want' to reset to 1 when 'number'=0:

dfwant<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2,3,3,3,4,4,4), 
               number=c(0,1,1,1,1,1,0,1,0,1,0,1,1),
               want=c(1,1,2,3,1,2,1,1,1,1,1,1,2),
               want2=c(3,3,3,3,2,2,2,1,1,1,2,2,2))

Thank you!

Upvotes: 1

Views: 73

Answers (3)

akrun
akrun

Reputation: 886938

We can do this easily with rleid and rowid from data.table

library(dplyr)
library(data.table)
df %>%
   group_by(group) %>%
   mutate(want = rowid(rleid(number)), want1 = max(want))
# A tibble: 13 x 4
# Groups:   group [4]
#   group number  want want1
#   <dbl>  <dbl> <int> <int>
# 1     1      0     1     3
# 2     1      1     1     3
# 3     1      1     2     3
# 4     1      1     3     3
# 5     2      1     1     2
# 6     2      1     2     2
# 7     2      0     1     2
# 8     3      1     1     1
# 9     3      0     1     1
#10     3      1     1     1
#11     4      0     1     2
#12     4      1     1     2
#13     4      1     2     2

Or using the data.table syntax

library(data.table)
setDT(df)[, want := rowid(rleid(number)),.(group)][, want1 := max(want), group][]

Upvotes: 2

GKi
GKi

Reputation: 39647

In base you can use ave for getting the cumsum per group like:

df$want  <- pmax(1, ave(df$number, df$group, cumsum(df$number==0), FUN=cumsum))
df$want2 <- ave(df$want, df$group, FUN=max)
identical(df, dfwant)
#[1] TRUE

To reset when df$number==0 you can use cumsum(df$number==0) as an additional grouping, as already shown by @Ronak-Shah, for ave.

In case df$number holds also other values than 0 and 1 you have to use:

df$want  <- pmax(1, ave(df$number==1, df$group, cumsum(df$number==0)
    , FUN=cumsum))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

Since we want to reset when number = 0, we group_by group and create a new group whenever the number is 0. We then calculate want by taking cumsum of occurrence of 1 within each group and want2 by taking max in group.

library(dplyr)

df %>%
  group_by(group, group1 = cumsum(number == 0)) %>%
  mutate(want = cumsum(number == 1), 
         want = replace(want, number == 0, 1)) %>%
  group_by(group) %>%
  mutate(want1 = max(want)) %>%
  select(-group1)

#   group number  want want1
#   <dbl>  <dbl> <dbl> <dbl>
# 1     1      0     1     3
# 2     1      1     1     3
# 3     1      1     2     3
# 4     1      1     3     3
# 5     2      1     1     2
# 6     2      1     2     2
# 7     2      0     1     2
# 8     3      1     1     1
# 9     3      0     1     1
#10     3      1     1     1
#11     4      0     1     2
#12     4      1     1     2
#13     4      1     2     2

Upvotes: 3

Related Questions