Reputation: 1626
In a data.table, I am trying to identify the groups which have more than 4 consecutive missing days/ rows. Following is a small sample set in which group B has some missing rows.
library(data.table)
dt <- structure(list(date = structure(c(17956L, 17959L, 17960L, 17961L,
17962L, 17963L, 17966L, 17967L, 17968L, 17969L, 17970L, 17973L,
17974L, 17975L, 17976L, 17977L, 17980L, 17981L, 17982L, 17983L,
17984L, 17956L, 17959L, 17960L, 17961L, 17962L, 17963L, 17966L,
17967L, 17968L, 17980L, 17981L, 17982L, 17983L, 17984L), class = c("IDate", "Date")),
group = c("A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B"),
value = c(43.7425,
43.9625, 43.8825, 43.63, 43.125, 43.2275, 44.725, 45.2275, 45.4275,
45.9325, 46.53, 47.005, 46.6325, 47.04, 48.7725, 47.7625, 47.185,
46.6975, 47.1175, 47.18, 47.4875, 12.31, 12.51, 12.7, 12.4, 12.63,
12.93, 13.18, 13.23, 13.35, 14.27, 14.5, 14.25, 13.88, 13.71)),
row.names = c(NA, -35L), class = c("data.table", "data.frame"))
dt
I want to identify that group B has more than 4 consecutive missing dates/ rows. If the consecutive missing dates/ rows are less than 4 days, then we need not isolate those groups.
Thanks!
Upvotes: 2
Views: 72
Reputation: 887891
We can use .I
library(data.table)
dt[dt[, .I[!!sum(diff(date) > 3)], group]$V1]
Upvotes: 2
Reputation: 389265
To count number of missing dates which are greater than 3 in each group
.
library(data.table)
dt[, .(n_miss = sum(diff(date) > 3)), group]
# group n_miss
#1: A 0
#2: B 1
To select those groups.
dt[, .SD[sum(diff(date) > 3) >0], group]
Upvotes: 2