jmich738
jmich738

Reputation: 1675

count consecutive occurrence and stop once found value

I have a data frame that looks like this:

account <- c('123','123','123','123')
bin <- c(3,6,9,12)
count <- c(0,0,2,0)

df <- data.frame(account,bin,count)
df
> df
  account bin count
1     123   3     0
2     123   6     0
3     123   9     2
4     123  12     0

I want an output that looks like this:

  > df
      account bin count cumCount
    1     123   3     0    1
    2     123   6     0    2
    3     123   9     2    0
    4     123  12     0    0

Basically, I need to count the number of consecutive zeror starting from bin = 3. But once count columns is >0 I want the rest of the values to be zero.

I've looked around the web a bit and here are 2 part solutions that are almost there:

df %>% 
  group_by(count) %>% 
  mutate(id = row_number())


# A tibble: 4 x 4
# Groups:   count [2]
  account   bin count    id
   <fctr> <dbl> <dbl> <int>
1     123     3     0     1
2     123     6     0     2
3     123     9     2     1
4     123    12     0     3

And

   df %>% 
  mutate( x = sequence(rle(
    as.character(count))$lengths))

> df %>% 
+   mutate( x = sequence(rle(
+     as.character(count))$lengths))
  account bin count x
1     123   3     0 1
2     123   6     0 2
3     123   9     2 1
4     123  12     0 1

but they still keep counting after zero is found.
Is there another solution?

Upvotes: 2

Views: 502

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

We could first create a row number column cumCount. After that we replace the values to 0 for index from the first occurrence of non-zero value to the end of dataframe.

df$cumCount = 1:nrow(df)
df$cumCount[which.max(df$count != 0) : nrow(df)] <- 0

df

#  account bin count cumCount
#1     123   3     0        1
#2     123   6     0        2
#3     123   9     2        0
#4     123  12     0        0

In dplyr, it is easier using row_number and replace function

library(dplyr)
df %>%
   mutate(cumCount = replace(row_number(), cumsum(count!=0) > 0, 0))


#  account bin count cumCount
#1     123   3     0        1
#2     123   6     0        2
#3     123   9     2        0
#4     123  12     0        0

The equivalent base R of the above dplyr version would be

df$cumCount <- replace(1:nrow(df), cumsum(df$count != 0) > 0, 0)

Upvotes: 3

Related Questions