haimen
haimen

Reputation: 2015

Filter for the datetimes that has gap greater than or equal to a number - R

Suppose I have a dataframe as follows,

ID  date           difference
1   5/1/2016 4:46   NA
2   5/1/2016 4:55   9
3   5/1/2016 5:01   6
4   5/1/2016 5:09   8
5   5/1/2016 5:16   7
6   5/1/2016 5:24   8
7   5/1/2016 5:31   7
8   5/1/2016 5:40   9
9   5/1/2016 5:46   6
10  5/1/2016 5:49   3
11  5/1/2016 5:54   5
12  5/1/2016 5:57   3
13  5/1/2016 6:10   13

Here I want to filter for entries that have the time difference greater than or equal to 9. I want to filter two entries before the time difference value is 9 and one entry after that. Basically I want to analyse the trends when the time difference is greater than 9 mins. So I want to take two entries before it and one entry after that, so that I will have 4 entries for each block of time entries to investigate.

I am able to filter it directly with one entry,

data %>% filter(difference >= 9) 

But I want to take two entries before it and one entry after it to investigate. Ideally my output should be like,

ID  date           difference
1   5/1/2016 4:46   NA
2   5/1/2016 4:55   9
3   5/1/2016 5:01   6
6   5/1/2016 5:24   8
7   5/1/2016 5:31   7
8   5/1/2016 5:40   9
9   5/1/2016 5:46   6
11  5/1/2016 5:54   5
12  5/1/2016 5:57   3
13  5/1/2016 6:10   13

Where whenever the time difference is greater than or equal to 9, I get two before entries and one after entry.

Can anybody give me some idea how to approach this?

Thanks

Upvotes: 0

Views: 1835

Answers (2)

Anton
Anton

Reputation: 1538

Note: I used a placeholder series for "date" because it doesn't matter in the context of this code.

There isn't a single-command way of doing what you need, as far as I know. I would suggest doing this with a loop:

df <- data.frame(1:13)
df <- cbind(df,2000:2012,c(NA,9,6,8,7,8,7,9,6,3,5,3,13))
colnames(df) <- c("ID","date","difference")

whichAtLeast9 <- which(df$difference >= 9)
wantedRows <- whichAtLeast9

for (i in whichAtLeast9) {
  entries_before <- c(i-1,i-2)
  entries_before <- subset(entries_before,entries_before>0)
  wantedRows <- append(wantedRows,entries_before)
  entries_after <- i+1
  entries_after <- subset(entries_after,entries_after<=13)
  wantedRows <- append(wantedRows,entries_after)
}

# Remove duplicates and sort.
wantedRows <- sort(unique(wantedRows))

finaldf <- df[wantedRows,]

Result:

> finaldf
   ID date difference
1   1 2000         NA
2   2 2001          9
3   3 2002          6
6   6 2005          8
7   7 2006          7
8   8 2007          9
9   9 2008          6
11 11 2010          5
12 12 2011          3
13 13 2012         13

Upvotes: 2

akuiper
akuiper

Reputation: 214927

You can try the shift, Reduce method in data.table, suppose dt is your data.frame:

library(data.table)
setDT(dt)
dt[Reduce("|", c(shift(difference >= 9, n = 0:1, type = "lag"), 
                 shift(difference >=9, n = 1:2, type = "lead")))]

          ID date difference
 1: 5/1/2016 4:46         NA
 2: 5/1/2016 4:55          9
 3: 5/1/2016 5:01          6
 4: 5/1/2016 5:24          8
 5: 5/1/2016 5:31          7
 6: 5/1/2016 5:40          9
 7: 5/1/2016 5:46          6
 8: 5/1/2016 5:54          5
 9: 5/1/2016 5:57          3
10: 5/1/2016 6:10         13

Upvotes: 1

Related Questions