Hui
Hui

Reputation: 25

Count number of rows between two specific rows

The question is I want to count the number of rows between two rows that those two rows share some same value.

I found a question similar to me but the difference is I want to count the number of A after B in that question and group by id(may not be necessary, since in my data the beginning value will always be 0 for each id) the link is here Counting number of rows between rows

Suppose I have data shows below

 a  <- data.frame(
    id     = c(1, 1, 1, 1, 2, 2, 2, 2, 2), 
    visits = c(0, 0, 1, 0, 0, 1, 2, 0, 0)
 )

and the output I want is

 a <- data.frame(
  id     = c(1, 1,  1, 1, 2,  2,  2, 2, 2), 
  visits = c(0, 0,  1, 0, 0,  1,  2, 0, 0), 
  z      = c(0, 1, NA, 0, 2, NA, NA, 0, 0)
 )

The value in the 3rd column could be NA or o or anything since I will move it so that is no problem. I want to count how many values larger than 0 are between two 0 value.

Many thanks.

Upvotes: 0

Views: 1884

Answers (2)

erocoar
erocoar

Reputation: 5893

This would work in a more general setting than the existing answer, i.e. also add counts for repeating 1's and any other values. The last row per group is NA but that can be changed easily.

   a %>%
      group_by(id) %>%
      mutate(
        c = map_dbl(
          seq_along(visits),
          ~ {
            m <- which(visits[.x] == visits[-seq(.x)])
            if (length(m) == 0) NA else min(m) - 1
          }
          )
        )

# A tibble: 9 x 3
# Groups:   id [2]
     id visits     c
  <dbl>  <dbl> <dbl>
1     1      0     0
2     1      0     1
3     1      1    NA
4     1      0    NA
5     2      0     2
6     2      1    NA
7     2      2    NA
8     2      0     0
9     2      0    NA

Upvotes: 0

MrFlick
MrFlick

Reputation: 206177

If I understand correctly, I think this will do what you want

library(dplyr)
a %>% 
  group_by(id) %>% 
  mutate(grp=cumsum(visits==0)) %>% 
  group_by(id, grp) %>% 
  mutate(z=ifelse(visits!=0, NA, sum(visits!=0))) %>% 
  ungroup() %>% mutate(grp=NULL)

Here we "split" the data into zeros with runs of non zeros using cumsum() in the first mutate. Then we group by those runs, and create our variable by setting z to the number of non-zeros in each of those runs. Then finally we get rid of our temporary grouping variable.

Upvotes: 1

Related Questions