Reputation: 322
I am trying to remove consecutive values in a data table. So in this case I want to eliminate all rows of every variable if there are more than 2 zeros in column a
. So I need something like a maxgap
to define how much consecutive zeros are allowed for some flexibility.
Here is an example:
library(data.table)
dt <- data.table(a = c(1, 2, 1, 0, 0, 0, 0, 1, 2),
b = as.factor(c("x", "y", "x", "x", "y", "z", "x", "y", "y")),
c = c(2, 5, 1, 0, 3, 6, 0, 3, 4))
and the result looks like this:
dtRes <- data.table(a = c(1, 2, 1, 1, 2),
b = as.factor(c("x", "y", "x", "y", "y")),
c = c(2, 5, 1, 3, 4))
Upvotes: 3
Views: 146
Reputation: 887223
We can use rleid
library(data.table)
dt[dt[, rleid(a == 0) * (a != 0) > 0]]
# a b c
#1: 1 x 2
#2: 2 y 5
#3: 1 x 1
#4: 1 y 3
#5: 2 y 4
Or with .I
dt[dt[, .I[!(all(a == 0) & .N > 2)], rleid(a == 0)]$V1]
# a b c
#1: 1 x 2
#2: 2 y 5
#3: 1 x 1
#4: 1 y 3
#5: 2 y 4
Upvotes: 0
Reputation: 389047
Using rle
:
library(data.table)
dt[!with(rle(a == 0), rep(values * lengths > 2, lengths))]
# a b c
#1: 1 x 2
#2: 2 y 5
#3: 1 x 1
#4: 1 y 3
#5: 2 y 4
Upvotes: 3
Reputation: 12559
You van do:
library("data.table")
dt <- data.table(a = c(1, 2, 1, 0, 0, 0, 0, 1, 2),
b = as.factor(c("x", "y", "x", "x", "y", "z", "x", "y", "y")),
c = c(2, 5, 1, 0, 3, 6, 0, 3, 4))
dt[, v:=rleidv(a==0)][]
dt[, if (a[1]!=0 | .N<3) .SD, v]
# > dt[, if (a[1]!=0 | .N<3) .SD, v]
# v a b c
# 1: 1 1 x 2
# 2: 1 2 y 5
# 3: 1 1 x 1
# 4: 3 1 y 3
# 5: 3 2 y 4
Upvotes: 0