Saurabh
Saurabh

Reputation: 1626

Identifying groups with missing dates in R

In a data.table, I am trying to identify the groups which have more than 4 consecutive missing days/ rows. Following is a small sample set in which group B has some missing rows.

library(data.table)
dt <- structure(list(date = structure(c(17956L, 17959L, 17960L, 17961L, 
                                  17962L, 17963L, 17966L, 17967L, 17968L, 17969L, 17970L, 17973L, 
                                  17974L, 17975L, 17976L, 17977L, 17980L, 17981L, 17982L, 17983L, 
                                  17984L, 17956L, 17959L, 17960L, 17961L, 17962L, 17963L, 17966L, 
                                  17967L, 17968L, 17980L, 17981L, 17982L, 17983L, 17984L), class = c("IDate", "Date")), 
               group = c("A", "A", "A", "A", "A", 
                          "A", "A", "A", "A", "A", "A", "A", "A", 
                          "A", "A", "A", "A", "A", "A", "A", "A", 
                          "B", "B", "B", "B", "B", "B", "B", "B", 
                          "B", "B", "B", "B", "B", "B"), 
               value = c(43.7425, 
                         43.9625, 43.8825, 43.63, 43.125, 43.2275, 44.725, 45.2275, 45.4275, 
                         45.9325, 46.53, 47.005, 46.6325, 47.04, 48.7725, 47.7625, 47.185, 
                         46.6975, 47.1175, 47.18, 47.4875, 12.31, 12.51, 12.7, 12.4, 12.63, 
                         12.93, 13.18, 13.23, 13.35, 14.27, 14.5, 14.25, 13.88, 13.71)), 
          row.names = c(NA, -35L), class = c("data.table", "data.frame"))
dt

I want to identify that group B has more than 4 consecutive missing dates/ rows. If the consecutive missing dates/ rows are less than 4 days, then we need not isolate those groups.

Thanks!

Upvotes: 2

Views: 72

Answers (2)

akrun
akrun

Reputation: 887891

We can use .I

library(data.table)
 dt[dt[, .I[!!sum(diff(date) > 3)], group]$V1]

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389265

To count number of missing dates which are greater than 3 in each group.

library(data.table)
dt[, .(n_miss = sum(diff(date) > 3)), group]

#   group n_miss
#1:     A      0
#2:     B      1

To select those groups.

dt[, .SD[sum(diff(date) > 3) >0], group]

Upvotes: 2

Related Questions