Filter dataset for duration of time since first observation R

Question

How would I go about filtering a dataset to remove all observations above a specific duration and any grouped observations (focal_id) below a specific threshold? My question is similar in principle to this OP.

Here is a sample of the dataset:

   focal_id time     first_time
 1     9128 08:40:30 08:40:30  
 2     9128 08:40:57 08:40:30  
 3     9128 08:41:27 08:40:30  
 4     9128 08:41:57 08:40:30  
 5     9128 08:42:27 08:40:30  
 6     9128 08:42:57 08:40:30  
 7     9128 08:43:27 08:40:30  
 8     9128 08:43:57 08:40:30  
 9     9128 08:44:17 08:40:30  
10     9128 08:44:29 08:40:30  
11     9128 08:44:29 08:40:30  
12     9128 08:45:06 08:40:30  
13     9128 08:45:28 08:40:30  
14     9128 08:46:32 08:40:30  
15     9128 08:46:58 08:40:30  
16     9128 08:47:28 08:40:30  
17     9128 08:47:28 08:40:30  
18     9128 08:48:00 08:40:30  
19     9128 08:48:30 08:40:30  
20     9128 08:48:31 08:40:30 
21     1022 06:20:02 06:20:02
22     1022 06:20:32 06:20:02
23     1022 06:21:00 06:20:02
24     1022 06:21:34 06:20:02

In this above example, I would want to filter out any observations (i.e., rows) after 7 minutes and 15 seconds (from the first time) and any focal_id with less than 5 minutes and 30 seconds of observations (i.e., rows) from the first time.

As a first step, I use dplyr to do the following:

filter_dataset<-dataset %>%
    group_by(focal_id) %>%
    arrange(time)  %>%
    mutate(first_time=min(time))

There has to be a way to calculate the duration (which I can then use to filter focal_id with < 5m30s minutes of observations) and also filter observations > 7m15s minutes for focal_id with durations that are longer than 7 minutes and 15 seconds.

I tried working within the lubridate package by first setting my time to hms, but this returned only the seconds. I also tried the duration function in the lubridate package, but it returned NAs.

This is my desired final output:

   focal_id     time   duration
1      9128 08:40:30   00:06:58
2      9128 08:40:57   00:06:58
3      9128 08:41:27   00:06:58
4      9128 08:41:57   00:06:58
5      9128 08:42:27   00:06:58
6      9128 08:42:57   00:06:58
7      9128 08:43:27   00:06:58
8      9128 08:43:57   00:06:58
9      9128 08:44:17   00:06:58
10     9128 08:44:29   00:06:58
11     9128 08:44:29   00:06:58
12     9128 08:45:06   00:06:58
13     9128 08:45:28   00:06:58
14     9128 08:46:32   00:06:58
15     9128 08:46:58   00:06:58
16     9128 08:47:28   00:06:58
17     9128 08:47:28   00:06:58

eipi10 · Accepted Answer

In the code below, if you want to keep duration as a duration (rather than a time) then remove the as_hms().

library(tidyverse)
library(hms)

dataframe %>% 
  mutate(time=as_hms(time),
         first_time=as_hms(first_time)) %>% 
  filter(time - first_time < 7.25*60) %>%
  group_by(focal_id) %>%
  mutate(duration = as_hms(max(time - first_time))) %>% 
  filter(duration > 5.5*60)

  focal_id time     first_time duration
                
 1    9128 08:40:30 08:40:30   06'58"  
 2    9128 08:40:57 08:40:30   06'58"  
 3    9128 08:41:27 08:40:30   06'58"  
 4    9128 08:41:57 08:40:30   06'58"  
 5    9128 08:42:27 08:40:30   06'58"  
 6    9128 08:42:57 08:40:30   06'58"  
 7    9128 08:43:27 08:40:30   06'58"  
 8    9128 08:43:57 08:40:30   06'58"  
 9    9128 08:44:17 08:40:30   06'58"  
10    9128 08:44:29 08:40:30   06'58"  
11    9128 08:44:29 08:40:30   06'58"  
12    9128 08:45:06 08:40:30   06'58"  
13    9128 08:45:28 08:40:30   06'58"  
14    9128 08:46:32 08:40:30   06'58"  
15    9128 08:46:58 08:40:30   06'58"  
16    9128 08:47:28 08:40:30   06'58"  
17    9128 08:47:28 08:40:30   06'58"

Filter dataset for duration of time since first observation R

Answers (1)

Related Questions