zxwjames
zxwjames

Reputation: 265

Select rows within a particular time range

I have a data frame like:

TimeStamp                    Category

2013-11-02 07:57:18 AM         0
2013-11-02 08:07:19 AM         0
2013-11-02 08:07:21 AM         0
2013-11-02 08:07:25 AM         1
2013-11-02 08:07:29 AM         0
2013-11-02 08:08:18 AM         0
2013-11-02 08:09:20 AM         0
2013-11-02 09:04:18 AM         0
2013-11-02 09:05:22 AM         0
2013-11-02 09:07:18 AM         0

What I want to do is to select the +-10 minute time frames when Category is "1".

For this case, because category = 1 is at 2013-11-02 08:07:25 AM, I want to select all rows within 07:57:25 AM to 08:17:25 AM.

What is the best way to handle this task?

addition, there maybe multiple "1" for each time frame. (the real data frame is more complicate, it contains multiple TimeStamp with different users, i.e. there is another column named "UserID")

Upvotes: 7

Views: 3308

Answers (6)

thelatemail
thelatemail

Reputation: 93813

In base R, without lubridate-ing or anything else (assuming that you're going to convert TimeStamp to a POSIXct object), like:

df$TimeStamp <- as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")
df[with(df, abs(difftime(TimeStamp[Category==1],TimeStamp,units="mins")) <= 10 ),]

#            TimeStamp Category
#2 2013-11-02 08:07:19        0
#3 2013-11-02 08:07:21        0
#4 2013-11-02 08:07:25        1
#5 2013-11-02 08:07:29        0
#6 2013-11-02 08:08:18        0
#7 2013-11-02 08:09:20        0

If you've got multiple 1's, you'd have to loop over it like:

check <- with(df, 
  lapply(TimeStamp[Category==1], function(x) abs(difftime(x,TimeStamp,units="mins")) <= 10 ) 
)
df[do.call(pmax, check)==1,]

Upvotes: 10

LyzandeR
LyzandeR

Reputation: 37879

This seems to work:

Data:

As per @DavidArenburg 's comment (and as mentioned in his answer) the right way to convert the timestamp column into a POSIXct object is (if it not already):

df$TimeStamp <- as.POSIXct(df$TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")

Solution:

library(lubridate) #for minutes
library(dplyr)     #for between
pickrows <- function(df) {
  #pick category == 1 rows
  df2 <- df[df$Category==1,]
  #for each timestamp create two variables start and end
  #for +10 and -10 minutes
  #then pick rows between them
  lapply(df2$TimeStamp, function(time) {
      start <- time - minutes(10)
      end   <- time + minutes(10)
      df[between(df$TimeStamp, start, end),]
  })
} 

#run function
pickrows(df)

Output:

> pickrows(df)
[[1]]
            TimeStamp Category
2 2013-11-02 08:07:19        0
3 2013-11-02 08:07:21        0
4 2013-11-02 08:07:25        1
5 2013-11-02 08:07:29        0
6 2013-11-02 08:08:18        0
7 2013-11-02 08:09:20        0

Keep in mind that the output in case of multiple Category==1 rows, my function's output will be a list (in this ocassion it only has one element) so a do.call(rbind, pickrows(df)) will be needed to combine everything in one data.frame.

Upvotes: 4

SabDeM
SabDeM

Reputation: 7190

Here is my solution with dplyr and lubridate. Here are the steps:

Find where category ==1, add to this, + and - 10 minutes with the lubridate's minutes with a simple c(-1, 1) * minutes(10) then using filter to subset based on the two interval stored in the rang vector.

library(lubridate)
library(dplyr)
wi1 <- which(dat$Category == 1 )
rang <- dat$TimeStamp[wi1] +  c(-1,1) * minutes(10)
dat %>% filter(TimeStamp >= rang[1] & TimeStamp <= rang[2])
            TimeStamp Category
1 2013-11-02 08:07:19        0
2 2013-11-02 08:07:21        0
3 2013-11-02 08:07:25        1
4 2013-11-02 08:07:29        0
5 2013-11-02 08:08:18        0
6 2013-11-02 08:09:20        0

Upvotes: 1

Arun
Arun

Reputation: 118779

I personally like the simplicity in the base R answer from @thelatemail. But just for fun, I'll provide another answer using rolling joins in data.table, as opposed to overlapping range joins solution provided by @DavidArenburg.

require(data.table)
dt_1 = dt[Category == 1L]
setkey(dt, TimeStamp)

ix1 = dt[.(dt_1$TimeStamp - 600L), roll=-Inf, which=TRUE] # NOCB
ix2 = dt[.(dt_1$TimeStamp + 600L), roll= Inf, which=TRUE] # LOCF

indices = data.table:::vecseq(ix1, ix2-ix1+1L, NULL) # not exported function
dt[indices]
#              TimeStamp Category
# 1: 2013-11-02 08:07:19        0
# 2: 2013-11-02 08:07:21        0
# 3: 2013-11-02 08:07:25        1
# 4: 2013-11-02 08:07:29        0
# 5: 2013-11-02 08:08:18        0
# 6: 2013-11-02 08:09:20        0

This should work just fine even if you've got more than one cell where Category is 1, AFAICT. It'd be great to wrap this up as a feature for this type of operations for data.table...

PS: refer to the other posts for converting TimeStamp into POSIXct format.

Upvotes: 3

Pierre L
Pierre L

Reputation: 28441

Using lubridate:

df$TimeStamp <- ymd_hms(df$TimeStamp)
span10 <- (df$TimeStamp[df$Category == 1] - minutes(10)) %--% (df$TimeStamp[df$Category == 1] + minutes(10))
df[df$TimeStamp %within% span10,]
            TimeStamp Category
2 2013-11-02 08:07:19        0
3 2013-11-02 08:07:21        0
4 2013-11-02 08:07:25        1
5 2013-11-02 08:07:29        0
6 2013-11-02 08:08:18        0
7 2013-11-02 08:09:20        0

Upvotes: 4

David Arenburg
David Arenburg

Reputation: 92282

Here's how I would approach this using data.table::foverlaps

First, convert TimeStamp to a proper POSIXct

library(data.table)
setDT(df)[, TimeStamp := as.POSIXct(TimeStamp, format = "%Y-%m-%d %I:%M:%S %p")]

Then we will create a temporary data set where Category == 1 to join against. We will also create an "end" column and key by both "start" and "end" columns

df2 <- setkey(df[Category == 1L][, TimeStamp2 := TimeStamp], TimeStamp, TimeStamp2)

Then, we will do the same for df but will set 10 minutes intervals

setkey(df[, `:=`(start = TimeStamp - 600, end = TimeStamp + 600)], start, end)

Then, all is left to do is to run foverlaps and subset by matched incidences

indx <- foverlaps(df, df2, which = TRUE, nomatch = 0L)$xid
df[indx, .(TimeStamp,  Category)]
#              TimeStamp Category
# 1: 2013-11-02 08:07:19        0
# 2: 2013-11-02 08:07:21        0
# 3: 2013-11-02 08:07:25        1
# 4: 2013-11-02 08:07:29        0
# 5: 2013-11-02 08:08:18        0
# 6: 2013-11-02 08:09:20        0

Upvotes: 7

Related Questions