Reputation: 149
So I'm having a dataset of the following form:
ID Var1 Var2
1 2 0
1 8 0
1 12 0
1 11 1
1 10 1
2 5 0
2 8 0
2 7 0
2 6 1
2 5 1
I would like to subset the dataframe and create a new dataframe, containing only the rows after Var1 first reached its group-maximum (including the row this happens) up to the row where Var2 becomes 1 for the first time (also including this row). So what I'd like to have should look like this:
ID Var1 Var2
1 12 0
1 11 1
2 8 0
2 7 0
2 6 1
The original dataset contains a number of NAs and the function should simply ignore those. Also if Var2 never reaches "1" for a group is should just add all rows to the new dataframe (of course only the ones after Var1 reaches its group maximum). However I cannot wrap my hand around the programming. Does anyone know help?
Upvotes: 1
Views: 504
Reputation: 76450
A dplyr
solution with cumsum
based filter
will do what the question asks for.
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(cumsum(Var1 == max(Var1)) == 1, cumsum(Var2) <= 1)
## A tibble: 5 x 3
## Groups: ID [2]
# ID Var1 Var2
# <int> <int> <int>
#1 1 12 0
#2 1 11 1
#3 2 8 0
#4 2 7 0
#5 2 6 1
Edit
Here is a solution that tries to answer to the OP's comment and question edit.
df1 %>%
group_by(ID) %>%
mutate_at(vars(starts_with('Var')), ~replace_na(., 0L)) %>%
filter(cumsum(Var1 == max(Var1)) == 1, cumsum(Var2) <= 1)
Data
df1 <- read.table(text = "
ID Var1 Var2
1 2 0
1 8 0
1 12 0
1 11 1
1 10 1
2 5 0
2 8 0
2 7 0
2 6 1
2 5 1
", header = TRUE)
Upvotes: 3
Reputation: 887213
Using data.table
with .I
library(data.table)
setDT(df1)[df1[, .I[cumsum(Var1 == max(Var1)) & cumsum(Var2) <= 1], by="ID"]$V1]
# ID Var1 Var2
#1: 1 12 0
#2: 1 11 1
#3: 2 8 0
#4: 2 7 0
#5: 2 6 1
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
Var1 = c(2L, 8L, 12L, 11L, 10L, 5L, 8L, 7L, 6L, 5L), Var2 = c(0L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L)), class = "data.frame",
row.names = c(NA,
-10L))
Upvotes: 1
Reputation: 17715
Here is data.table
translation of Rui Barradas' working solution:
library(data.table)
dat <- fread(text = "
ID Var1 Var2
1 2 0
1 8 0
1 12 0
1 11 1
1 10 1
2 5 0
2 8 0
2 7 0
2 6 1
2 5 1
", header = TRUE)
dat[, .SD[cumsum(Var1 == max(Var1)) & cumsum(Var2) <= 1], by="ID"]
Upvotes: 0