R Data.Table Solution for DPLYR Resolution

Question

data1=data.frame("StudentID"=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6),
                 "Time"=c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6),
                 "var1"=c(0,0,0,NA,1,2,0,1,2,2,2,2,0,0,NA,1,1,1,NA,0,0,0,0,1,0,0,0,NA,0,0,0,0,0,1,NA,NA))


library(dplyr)
data2 <- group_by(data1, StudentID) %>% 
  slice(seq_len(min(which(var1 == 1), n())))

After much attempt I am able to obtain 'data2' from 'data1'. The rule is simple that in data1 FOR EACH STUDENTID if var1 equals to 1, keep that row and delete everything after.

akrun · Accepted Answer

If we want a similar option in data.table, either use the condition in .SD

library(data.table)
setDT(data1)[, .SD[c(seq_len(min(which(var1 == 1), .N)))],.(StudentID)]

or use row index with .I, and extract the column as $V1 to subset the dataset

setDT(data1)[data1[, .I[c(seq_len(min(which(var1 == 1), .N)))],.(StudentID)]$V1]

Or with match

setDT(data1)[, .SD[seq_len(min(match(1, var1), .N, na.rm = TRUE))], .(StudentID)]

R Data.Table Solution for DPLYR Resolution

Answers (2)

Related Questions