Reputation: 1315
How would I go about filtering a dataset to remove all observations above a specific duration and any grouped observations (focal_id
) below a specific threshold? My question is similar in principle to this OP.
Here is a sample of the dataset:
focal_id time first_time
1 9128 08:40:30 08:40:30
2 9128 08:40:57 08:40:30
3 9128 08:41:27 08:40:30
4 9128 08:41:57 08:40:30
5 9128 08:42:27 08:40:30
6 9128 08:42:57 08:40:30
7 9128 08:43:27 08:40:30
8 9128 08:43:57 08:40:30
9 9128 08:44:17 08:40:30
10 9128 08:44:29 08:40:30
11 9128 08:44:29 08:40:30
12 9128 08:45:06 08:40:30
13 9128 08:45:28 08:40:30
14 9128 08:46:32 08:40:30
15 9128 08:46:58 08:40:30
16 9128 08:47:28 08:40:30
17 9128 08:47:28 08:40:30
18 9128 08:48:00 08:40:30
19 9128 08:48:30 08:40:30
20 9128 08:48:31 08:40:30
21 1022 06:20:02 06:20:02
22 1022 06:20:32 06:20:02
23 1022 06:21:00 06:20:02
24 1022 06:21:34 06:20:02
In this above example, I would want to filter out any observations (i.e., rows) after 7 minutes and 15 seconds (from the first time) and any focal_id
with less than 5 minutes and 30 seconds of observations (i.e., rows) from the first time.
As a first step, I use dplyr to do the following:
filter_dataset<-dataset %>%
group_by(focal_id) %>%
arrange(time) %>%
mutate(first_time=min(time))
There has to be a way to calculate the duration (which I can then use to filter focal_id
with < 5m30s minutes of observations) and also filter observations > 7m15s minutes for focal_id
with durations that are longer than 7 minutes and 15 seconds.
I tried working within the lubridate
package by first setting my time to hms
, but this returned only the seconds. I also tried the duration
function in the lubridate
package, but it returned NAs.
This is my desired final output:
focal_id time duration
1 9128 08:40:30 00:06:58
2 9128 08:40:57 00:06:58
3 9128 08:41:27 00:06:58
4 9128 08:41:57 00:06:58
5 9128 08:42:27 00:06:58
6 9128 08:42:57 00:06:58
7 9128 08:43:27 00:06:58
8 9128 08:43:57 00:06:58
9 9128 08:44:17 00:06:58
10 9128 08:44:29 00:06:58
11 9128 08:44:29 00:06:58
12 9128 08:45:06 00:06:58
13 9128 08:45:28 00:06:58
14 9128 08:46:32 00:06:58
15 9128 08:46:58 00:06:58
16 9128 08:47:28 00:06:58
17 9128 08:47:28 00:06:58
Upvotes: 0
Views: 306
Reputation: 93761
In the code below, if you want to keep duration
as a duration (rather than a time) then remove the as_hms()
.
library(tidyverse)
library(hms)
dataframe %>%
mutate(time=as_hms(time),
first_time=as_hms(first_time)) %>%
filter(time - first_time < 7.25*60) %>%
group_by(focal_id) %>%
mutate(duration = as_hms(max(time - first_time))) %>%
filter(duration > 5.5*60)
focal_id time first_time duration <int> <time> <time> <time> 1 9128 08:40:30 08:40:30 06'58" 2 9128 08:40:57 08:40:30 06'58" 3 9128 08:41:27 08:40:30 06'58" 4 9128 08:41:57 08:40:30 06'58" 5 9128 08:42:27 08:40:30 06'58" 6 9128 08:42:57 08:40:30 06'58" 7 9128 08:43:27 08:40:30 06'58" 8 9128 08:43:57 08:40:30 06'58" 9 9128 08:44:17 08:40:30 06'58" 10 9128 08:44:29 08:40:30 06'58" 11 9128 08:44:29 08:40:30 06'58" 12 9128 08:45:06 08:40:30 06'58" 13 9128 08:45:28 08:40:30 06'58" 14 9128 08:46:32 08:40:30 06'58" 15 9128 08:46:58 08:40:30 06'58" 16 9128 08:47:28 08:40:30 06'58" 17 9128 08:47:28 08:40:30 06'58"
Upvotes: 2