Oriol Prat
Oriol Prat

Reputation: 1047

incremental counter within dataframe only when a condition is met in r

I would like to create an accumulative incremental counter that increases only when a condition is met.

DT <- data.table(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), 
               b = c(10L, 5L, 3L, 4L, 2L, 6L, 1L, 3L, 5L, 7L))

I don't get the desired result with rleid because when two conditions are met in consecutive rows, the increment is not performed

> DT[,count := rleid(b>=5),id]
> DT
    id  b count
 1:  1 10     1
 2:  1  5     1
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     4
 8:  2  3     1
 9:  2  5     2
10:  2  7     2

The expected result is

> DT
    id  b count
 1:  1 10     1
 2:  1  5     2
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     3
 8:  2  3     1
 9:  2  5     2
10:  2  7     3

Upvotes: 4

Views: 985

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

We can also use accumulate function for this purpose. Here are some notes on this solution:

  • accumulate takes a two argument function as its .f argument where .x is the previous/ accumulated value and .y is the current value in the sequence of values of vector b
  • I set the initial value of count as 1 thus remove the first value of b cause we don't need it anymore and check the next value by .y and if the condition is met it will be added by one otherwise it remains as is.
library(dplyr)
library(purrr)

DT %>%
  group_by(id) %>%
  mutate(count = accumulate(b[-1], .init = 1,
                            ~ if(.y >= 5) {
                              .x + 1
                            } else {
                              .x
                            }))

# A tibble: 10 x 3
# Groups:   id [2]
      id     b count
   <dbl> <int> <dbl>
 1     1    10     1
 2     1     5     2
 3     1     3     2
 4     1     4     2
 5     1     2     2
 6     1     6     3
 7     1     1     3
 8     2     3     1
 9     2     5     2
10     2     7     3

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 101129

Another data.table option with cumsum

> DT[, count := (v <- cumsum(b >= 5)) - v[1] + 1, id][]
    id  b count
 1:  1 10     1
 2:  1  5     2
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     3
 8:  2  3     1
 9:  2  5     2
10:  2  7     3

Upvotes: 3

akrun
akrun

Reputation: 886948

Here is an option with cumsum. Grouped by 'id', get the cumulative sum of logical expression (b >= 5). For 'id' 2, the first element that is greater than or equal to 5 is at position 2 (in the grouped position), thus the first row will be 0. Inorder to make this 1, an option is to convert it to factor and then coerce to integer so that we get the integer storage values (R indexing starts from 1)

DT[, count := as.integer(factor(cumsum(b >= 5))), id]

-output

DT
    id  b count
 1:  1 10     1
 2:  1  5     2
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     3
 8:  2  3     1
 9:  2  5     2
10:  2  7     3

Upvotes: 6

Related Questions