Reputation: 2015
Suppose I have a dataframe as follows,
ID date difference
1 5/1/2016 4:46 NA
2 5/1/2016 4:55 9
3 5/1/2016 5:01 6
4 5/1/2016 5:09 8
5 5/1/2016 5:16 7
6 5/1/2016 5:24 8
7 5/1/2016 5:31 7
8 5/1/2016 5:40 9
9 5/1/2016 5:46 6
10 5/1/2016 5:49 3
11 5/1/2016 5:54 5
12 5/1/2016 5:57 3
13 5/1/2016 6:10 13
Here I want to filter for entries that have the time difference greater than or equal to 9. I want to filter two entries before the time difference value is 9 and one entry after that. Basically I want to analyse the trends when the time difference is greater than 9 mins. So I want to take two entries before it and one entry after that, so that I will have 4 entries for each block of time entries to investigate.
I am able to filter it directly with one entry,
data %>% filter(difference >= 9)
But I want to take two entries before it and one entry after it to investigate. Ideally my output should be like,
ID date difference
1 5/1/2016 4:46 NA
2 5/1/2016 4:55 9
3 5/1/2016 5:01 6
6 5/1/2016 5:24 8
7 5/1/2016 5:31 7
8 5/1/2016 5:40 9
9 5/1/2016 5:46 6
11 5/1/2016 5:54 5
12 5/1/2016 5:57 3
13 5/1/2016 6:10 13
Where whenever the time difference is greater than or equal to 9, I get two before entries and one after entry.
Can anybody give me some idea how to approach this?
Thanks
Upvotes: 0
Views: 1835
Reputation: 1538
Note: I used a placeholder series for "date" because it doesn't matter in the context of this code.
There isn't a single-command way of doing what you need, as far as I know. I would suggest doing this with a loop:
df <- data.frame(1:13)
df <- cbind(df,2000:2012,c(NA,9,6,8,7,8,7,9,6,3,5,3,13))
colnames(df) <- c("ID","date","difference")
whichAtLeast9 <- which(df$difference >= 9)
wantedRows <- whichAtLeast9
for (i in whichAtLeast9) {
entries_before <- c(i-1,i-2)
entries_before <- subset(entries_before,entries_before>0)
wantedRows <- append(wantedRows,entries_before)
entries_after <- i+1
entries_after <- subset(entries_after,entries_after<=13)
wantedRows <- append(wantedRows,entries_after)
}
# Remove duplicates and sort.
wantedRows <- sort(unique(wantedRows))
finaldf <- df[wantedRows,]
Result:
> finaldf
ID date difference
1 1 2000 NA
2 2 2001 9
3 3 2002 6
6 6 2005 8
7 7 2006 7
8 8 2007 9
9 9 2008 6
11 11 2010 5
12 12 2011 3
13 13 2012 13
Upvotes: 2
Reputation: 214927
You can try the shift
, Reduce
method in data.table
, suppose dt
is your data.frame:
library(data.table)
setDT(dt)
dt[Reduce("|", c(shift(difference >= 9, n = 0:1, type = "lag"),
shift(difference >=9, n = 1:2, type = "lead")))]
ID date difference
1: 5/1/2016 4:46 NA
2: 5/1/2016 4:55 9
3: 5/1/2016 5:01 6
4: 5/1/2016 5:24 8
5: 5/1/2016 5:31 7
6: 5/1/2016 5:40 9
7: 5/1/2016 5:46 6
8: 5/1/2016 5:54 5
9: 5/1/2016 5:57 3
10: 5/1/2016 6:10 13
Upvotes: 1