hbabbar
hbabbar

Reputation: 967

How to use window function in R

I have the following data frame structure :

id status
a  1
a  2
a  1
b  1
b  1
b  0
b  1
c  0
c  0
c  2
c  1
d  0
d  2
d  0

Here a,b,c are unique id's and status is a flag ranging from 0,1 and 2.

I need to select each individual id whose status has changed from 0 to 1 in any point during the whole time frame, so the expected output of this would be two id's 'b' and 'c'.

I thought of using lag to accomplish that but in that case, I wont't be able to handle id 'c', in which there is a 0 in the beginning but it reaches 1 at some stage. Any thoughts on how we can achieve this using window functions (or any other technique)

Upvotes: 2

Views: 923

Answers (3)

markdly
markdly

Reputation: 4534

One possible way using dplyr (Edited to include id only when a 1 appears after a 0):

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(status %in% c(0, 1)) %>%
  filter(status == 0 & lead(status, default = 0) == 1) %>%
  select(id) %>% unique()
#> # A tibble: 2 x 1
#> # Groups:   id [2]
#>      id
#>   <chr>
#> 1     b
#> 2     c

Data

df <- read.table(text = "id status
a  1
a  2
a  1
b  1
b  1
b  0
b  1
c  0
c  0
c  2
c  1
d  0
d  2
d  0", header = TRUE, stringsAsFactors = FALSE)

Upvotes: 2

lukeA
lukeA

Reputation: 54237

I dunno if this is the most efficient way, but: split by id, check statuses for 0, and if there is any, check for 1 behind the 0 index:

lst <- split(df$status, df$id)
f <- function(x) {
  if (!any(x==0)) return(FALSE)
  any(x[which.max(x==0):length(x)]==1)
}
names(lst)[(sapply(lst, f))]
# [1] "b" "c"

Upvotes: 1

jlesuffleur
jlesuffleur

Reputation: 1253

You want to find id's having a status of 1 after having had a status of 0.

Here is a dplyr solution:

library(dplyr)
# Generate data
mydf = data_frame(
  id = c(rep("a", 3), rep("b", 4), rep("c", 4), rep("d", 3)),
  status = c(1, 2, 1, 1, 1, 0, 1, 0, 0, 2, 1, 0, 2, 0)
)

mydf %>% group_by(id) %>%
  # Keep only 0's and 1's
  filter(status %in% c(0,1)) %>% 
  # Compute diff between two status
  mutate(dif = status - lag(status, 1)) %>%
  # If it is 1, it is a 0 => 1
  filter(dif == 1) %>%
  # Catch corresponding id's
  select(id) %>%
  unique

Upvotes: 2

Related Questions