Reputation: 79
My goal is to count the number of rows between rows with specific values. My starting point is this:
id=c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
var1=c('A', 'B', 'E', 'N', 'J', 'B', 'J', 'A', 'C', 'B', 'L', 'M', 'B', 'C', 'F'),
var2=c(1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1 )
)
The goal is to count the number of rows grouped by id, between row with var1='B' and var2='1' and row with var1='B' and var2='2'. So for id=1 there are 3 such rows, and for id=2, there are 2 such rows. My target is this:
id=c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
var1=c('A', 'B', 'E', 'N', 'J', 'B', 'J', 'A', 'C', 'B', 'L', 'M', 'B', 'C', 'F'),
var2=c(1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1 ),
num=c(NA, NA, 1, 2, 3, NA, NA, NA, NA, NA, 1, 2, NA, NA, NA)
)
What I have tried before: Count number of rows between two specific rows I'm still not sure what task the person here is trying to accomplish. I also tried this: Counting number of rows between rows This bit is counting not between 2 rows, but from top row on. I need to count between specified rows. I understand that this task can be accomplished in a variety of ways. Such as, I can just select only the rows that I need to count and discard anything else. Or I can count within the dataframe that I have. But I'm not sure how to do either of these.
Upvotes: 1
Views: 748
Reputation: 8110
You could try something like this:
library(tidyverse)
my_df <- tibble(
id=c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
var1=c('A', 'B', 'E', 'N', 'J', 'B', 'J', 'A', 'C', 'B', 'L', 'M', 'B', 'C', 'F'),
var2=c(1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1 )
)
my_df %>%
mutate(rw_nm = row_number()) %>%
filter(var1 == "B" & var2 %in% c(1,2))%>%
group_by(id) %>%
summarise(count = (rw_nm[[2]] - rw_nm[[1]])-1)
#> # A tibble: 2 x 2
#> id count
#> <dbl> <dbl>
#> 1 1 3
#> 2 2 2
Upvotes: 1