Reputation: 23
my data set is as follows:
df <- tibble(id = c("firm a","firm b","firm c","firm d","firm e","firm a","firm e","firm b","firm f","firm g"),
n1 = c(1,1,1,1,1,0,0,0,0,0),
n2 = c(0,0,0,0,0,1,1,1,1,1))
id n1 n2
<chr> <dbl> <dbl>
1 firm a 1 0
2 firm b 1 0
3 firm c 1 0
4 firm d 1 0
5 firm e 1 0
6 firm a 0 1
7 firm e 0 1
8 firm b 0 1
9 firm f 0 1
10 firm g 0 1
I want to filter only the rows where the variable id
is associated with both a value of 1 in variable n1
and a value of 1 in variable n2
.
In this example, the three rows that satisfy this criterion are "firm a", "firm b", and "firm e".
I have tried using the filter
function in dplyr
, but I could not find how to condition one variable based on values in two other variables.
Upvotes: 1
Views: 899
Reputation: 79338
df %>%
group_by(id) %>%
filter(any(n1 == 1), any(n2 == 1))
# A tibble: 6 x 3
# Groups: id [3]
id n1 n2
<chr> <dbl> <dbl>
1 firm a 1 0
2 firm b 1 0
3 firm e 1 0
4 firm a 0 1
5 firm e 0 1
6 firm b 0 1
Upvotes: 0
Reputation: 79246
After arranging column id
we could us lead
function and then filter with if_all
:
library(dplyr)
df %>%
arrange(id) %>%
mutate(n2=lead(n2, default = n2[1])) %>%
filter(if_all(n1:n2, ~ . == 1))
id n1 n2
<chr> <dbl> <dbl>
1 firm a 1 1
2 firm b 1 1
3 firm e 1 1
Upvotes: 2