Reputation: 967
I have the following data frame structure :
id status
a 1
a 2
a 1
b 1
b 1
b 0
b 1
c 0
c 0
c 2
c 1
d 0
d 2
d 0
Here a,b,c are unique id's and status is a flag ranging from 0,1 and 2.
I need to select each individual id whose status has changed from 0 to 1 in any point during the whole time frame, so the expected output of this would be two id's 'b' and 'c'.
I thought of using lag to accomplish that but in that case, I wont't be able to handle id 'c', in which there is a 0 in the beginning but it reaches 1 at some stage. Any thoughts on how we can achieve this using window functions (or any other technique)
Upvotes: 2
Views: 923
Reputation: 4534
One possible way using dplyr
(Edited to include id
only when a 1
appears after a 0
):
library(dplyr)
df %>%
group_by(id) %>%
filter(status %in% c(0, 1)) %>%
filter(status == 0 & lead(status, default = 0) == 1) %>%
select(id) %>% unique()
#> # A tibble: 2 x 1
#> # Groups: id [2]
#> id
#> <chr>
#> 1 b
#> 2 c
Data
df <- read.table(text = "id status
a 1
a 2
a 1
b 1
b 1
b 0
b 1
c 0
c 0
c 2
c 1
d 0
d 2
d 0", header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2
Reputation: 54237
I dunno if this is the most efficient way, but: split by id
, check status
es for 0
, and if there is any, check for 1
behind the 0 index:
lst <- split(df$status, df$id)
f <- function(x) {
if (!any(x==0)) return(FALSE)
any(x[which.max(x==0):length(x)]==1)
}
names(lst)[(sapply(lst, f))]
# [1] "b" "c"
Upvotes: 1
Reputation: 1253
You want to find id's having a status of 1 after having had a status of 0.
Here is a dplyr solution:
library(dplyr)
# Generate data
mydf = data_frame(
id = c(rep("a", 3), rep("b", 4), rep("c", 4), rep("d", 3)),
status = c(1, 2, 1, 1, 1, 0, 1, 0, 0, 2, 1, 0, 2, 0)
)
mydf %>% group_by(id) %>%
# Keep only 0's and 1's
filter(status %in% c(0,1)) %>%
# Compute diff between two status
mutate(dif = status - lag(status, 1)) %>%
# If it is 1, it is a 0 => 1
filter(dif == 1) %>%
# Catch corresponding id's
select(id) %>%
unique
Upvotes: 2