How to count the number of rows a condition has been met

Question

I have a dataframe df of stock liquidity (dput statement below), and I need to measure how many of the previous consecutive three rows the value has been > 10:

  date     sec_id  liquidity   good       count.good.rows
2016-07-29   3277  9.142245 FALSE               0
2016-08-31   3277 11.070555  TRUE               0
2016-09-30   3277 11.934113  TRUE               1
2016-10-31   3277 12.192237  TRUE               2
2016-11-30   3277 10.165183  TRUE               3
2016-12-30   3277  8.414033 FALSE               3
2016-01-29   3426  6.494181 FALSE               0
2016-02-29   3426  8.216213 FALSE               0
2016-03-31   3426 10.081115  TRUE               0
2016-04-29   3426 10.119685  TRUE               1
2016-05-31   3426  8.659732 FALSE               2
2016-06-30   3426  6.790178 FALSE               1
2016-07-29   3426  7.234159 FALSE               0

Note a few things about the data:

There are multiple sec_id values, and I need to do this work on each sec_id value, based on the order of the data column.
I've already added the good column, but can't figure out how to do the count.good.rows column without explicitly using lag(...,1) + lag(...,2) + lag(...,3). This would be a poor solution because I need the 3 to be a variable (I may end up wanting to look at previous 2 rows, or 4 rows).

Any ideas?

Here's my full dput:

df = structure(list(date = structure(c(16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165, 16829, 16860, 16891, 16920, 16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165), class = "Date"),
    sec_id = c(3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3277L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L, 3426L),
    liquidity = c(4.014428, 3.779665, 4.833813, 5.244417, 7.150838, 7.639399, 9.142245, 11.070555, 11.934113, 12.192237, 10.165183, 8.414033, 6.494181, 8.216213, 10.081115, 10.119685, 8.659732, 6.790178, 7.234159, 8.529101, 9.015898, 8.307979, 8.231237, 8.711095),
    good = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
    class = "data.frame", .Names = c("date", "sec_id", "liquidity", "good"), 
    row.names = c(NA, -24L))

CPak · Accepted Answer

Try zoo::rollapply

library(zoo)
df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(good, 3, sum, align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   #                        
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   1
 # 3 2016-09-30   3277     11.9  T                   2
 # 4 2016-10-31   3277     12.2  T                   3
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   2
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   1
# 10 2016-04-29   3426     10.1  T                   2
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   1
# 13 2016-07-29   3426      7.23 F                   0

Edit If you're only interested in counting the previous three rows

df %>%
  group_by(sec_id) %>%
  mutate(count_good_rows = rollapply(dplyr::lag(good, 1), 3, function(i) sum(i, na.rm=TRUE), align="right", partial=TRUE))

# A tibble: 13 x 5
# Groups: sec_id [2]
   # date       sec_id liquidity good  count_good_rows
   #                        
 # 1 2016-07-29   3277      9.14 F                   0
 # 2 2016-08-31   3277     11.1  T                   0
 # 3 2016-09-30   3277     11.9  T                   1
 # 4 2016-10-31   3277     12.2  T                   2
 # 5 2016-11-30   3277     10.2  T                   3
 # 6 2016-12-30   3277      8.41 F                   3
 # 7 2016-01-29   3426      6.49 F                   0
 # 8 2016-02-29   3426      8.22 F                   0
 # 9 2016-03-31   3426     10.1  T                   0
# 10 2016-04-29   3426     10.1  T                   1
# 11 2016-05-31   3426      8.66 F                   2
# 12 2016-06-30   3426      6.79 F                   2
# 13 2016-07-29   3426      7.23 F                   1

Data

df <- read.table(text="date     sec_id  liquidity   good     
2016-07-29   3277  9.142245 FALSE 
2016-08-31   3277 11.070555  TRUE  
2016-09-30   3277 11.934113  TRUE  
2016-10-31   3277 12.192237  TRUE  
2016-11-30   3277 10.165183  TRUE  
2016-12-30   3277  8.414033 FALSE 
2016-01-29   3426  6.494181 FALSE 
2016-02-29   3426  8.216213 FALSE 
2016-03-31   3426 10.081115  TRUE  
2016-04-29   3426 10.119685  TRUE  
2016-05-31   3426  8.659732 FALSE 
2016-06-30   3426  6.790178 FALSE  
2016-07-29   3426  7.234159 FALSE  ", header=TRUE)

How to count the number of rows a condition has been met

Answers (2)

Related Questions