Reputation: 1420
I have data.table
as following:-
data <- data.table(k = c("a", "a", "a", "a", "b", "b", "c", "c", "c", "d"),
year = c(2011, 2012, 2013, 2014, 2012, 2013, 2014, 2015, 2016, 2001),
grow_bool = c(1, 1, 0, 1, 0, 1, 1, 0, 1, 0))
# k year grow_bool
# 1: a 2011 1
# 2: a 2012 1
# 3: a 2013 0
# 4: a 2014 1
# 5: b 2012 0
# 6: b 2013 1
# 7: c 2014 1
# 8: c 2015 0
# 9: c 2016 1
#10: d 2001 0
Now, I want to remove the rows which have a 0
in the column grow_bool
ans subsequent rows after it for each k
. For example:- a
in 2013
has a 0
in grow_bool
, hence, all this row and all the rows for a
after this row should be deleted. All the rows for b
will be deleted since, the first row for b
consists a 0
. d
has only one row and it will be deleted since, it has 0
.
The resulting data.table
should be of the following form:-
data_2 <- data.table(k = c(a, a, c),
year = c(2011, 2012, 2014),
grow_bool = c(1, 1, 1))
# k year grow_bool
# 1: a 2011 1
# 2: a 2012 1
# 3: c 2014 1
Thanks in advance.
Upvotes: 3
Views: 127
Reputation: 25225
Here is another option using cummin
:
DT[DT[, .I[cummin(value) >= 1L], ID]$V1]
or
DT[DT[, cummin(value) >= 1L, ID]$V1]
Timing depends on the number of groups within your dataset. Here are some timings when there are large number of groups:
library(data.table)
set.seed(0L)
nr <- 1e7L
ng <- 5e5L
DT <- data.table(ID=sample(ng, nr, TRUE), value=sample(0:1, nr, TRUE))
setorder(DT, ID)
microbenchmark::microbenchmark(times=3L,
check=function(values) {
all(sapply(values[-1L], function(x) fsetequal(values[[1L]], x)))
},
DT[!DT[, cumsum(value==0L) > 0L, ID]$V1],
DT[-DT[, .I[cumsum(value==0L)>0L], ID]$V1],
DT[DT[, .I[cumsum(value==0L)==0L], ID]$V1],
DT[DT[, .I[cummin(value) >= 1L], ID]$V1],
DT[DT[, cummin(value) >= 1L, ID]$V1])
timings:
Unit: milliseconds
expr min lq mean median uq max neval
DT[!DT[, cumsum(value == 0L) > 0L, ID]$V1] 766.2710 774.0762 796.6203 781.8814 811.7950 841.7086 3
DT[-DT[, .I[cumsum(value == 0L) > 0L], ID]$V1] 1040.6440 1086.4132 1105.8104 1132.1824 1138.3936 1144.6047 3
DT[DT[, .I[cumsum(value == 0L) == 0L], ID]$V1] 774.5525 828.6320 851.2925 882.7115 889.6626 896.6136 3
DT[DT[, .I[cummin(value) >= 1L], ID]$V1] 723.6663 727.0806 731.0786 730.4950 734.7848 739.0745 3
DT[DT[, cummin(value) >= 1L, ID]$V1] 657.1164 661.7835 664.8258 666.4506 668.6805 670.9103 3
check:
a0 <- DT[!DT[, cumsum(value==0L) > 0L, ID]$V1]
a1 <- DT[DT[, .I[cumsum(value==0L)==0L], ID]$V1]
a2 <- DT[-DT[, .I[cumsum(value==0L)>0L], ID]$V1]
a3 <- DT[DT[, .I[cummin(value) >= 1L], ID]$V1]
a4 <- DT[DT[, cummin(value) >= 1L, ID]$V1]
fsetequal(a0, a1)
#[1] TRUE
fsetequal(a0, a2)
#[1] TRUE
fsetequal(a0, a3)
#[1] TRUE
fsetequal(a0, a4)
#[1] TRUE
Upvotes: 3
Reputation: 8880
library(data.table)
library(magrittr)
data <-
data.table(
k = c("a", "a", "a", "a", "b", "b", "c", "c", "c", "d"),
year = c(2011, 2012, 2013, 2014, 2012, 2013, 2014, 2015, 2016, 2001),
grow_bool = c(1, 1, 0, 1, 0, 1, 1, 0, 1, 0)
)
data[, fltr := cumsum(grow_bool == 0), by = k] %>%
.[fltr == 0] %>%
.[, fltr := NULL] %>%
.[]
#> k year grow_bool
#> 1: a 2011 1
#> 2: a 2012 1
#> 3: c 2014 1
Created on 2021-07-23 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 27772
data[!data[, cumsum(grow_bool == 0) > 0, by = .(k)]$V1, ]
# k year grow_bool
# 1: a 2011 1
# 2: a 2012 1
# 3: c 2014 1
Upvotes: 6