Reputation: 453
This is what the sample looks like:
# A tibble: 10 x 3
trip_id start_time end_time
<dbl> <dttm> <dttm>
1 1 2020-10-05 20:21:42 2020-10-05 21:04:00
2 2 2020-10-05 08:28:42 2020-10-05 08:58:42
3 3 2020-10-05 18:41:59 2020-10-05 18:50:59
4 4 2020-10-05 11:11:36 2020-10-05 12:12:54
5 5 2020-10-05 19:54:38 2020-10-05 20:28:38
6 6 2020-10-05 16:09:16 2020-10-05 20:00:42
7 7 2020-10-05 09:16:33 2020-10-05 11:16:27
8 23653 2020-10-05 11:33:22 2020-10-05 12:05:06
9 23654 2020-10-05 07:38:07 2020-10-05 07:40:07
10 23655 2020-10-05 11:33:03 2020-10-05 11:42:02
data <- structure(list(trip_id = c(1, 2, 3, 4, 5, 6, 7, 23653, 23654,23655), start_time = structure(c(1601929302, 1601886522, 1601923319,1601896296, 1601927678, 1601914156, 1601889393, 1601897602, 1601883487,1601897583), tzone = "UTC", class = c("POSIXct", "POSIXt")),end_time = structure(c(1601931840, 1601888322, 1601923859,1601899974, 1601929718, 1601928042, 1601896587, 1601899506,1601883607, 1601898122), tzone = "UTC", class = c("POSIXct","POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame"))
I want to separate row by the different time (start_time
and end_time
columns) into two new results.
(1)The first result is the time between 09.00-18.00
(2)Second is anytime except 09.00-18.00
However, if start_time
and end_time
lead to a different result, then the start_time
column should be the gain for that row.
So, my results should be 2 results separately. For example,
Result (1)
# A tibble: 10 x 3
trip_id start_time end_time
<dbl> <dttm> <dttm>
1 4 2020-10-05 11:11:36 2020-10-05 12:12:54
2 6 2020-10-05 16:09:16 2020-10-05 20:00:42
3 7 2020-10-05 09:16:33 2020-10-05 11:16:27
4 23653 2020-10-05 11:33:22 2020-10-05 12:05:06
5 23655 2020-10-05 11:33:03 2020-10-05 11:42:02
Result (2)
# A tibble: 10 x 3
trip_id start_time end_time
<dbl> <dttm> <dttm>
1 1 2020-10-05 20:21:42 2020-10-05 21:04:00
2 2 2020-10-05 08:28:42 2020-10-05 08:58:42
3 3 2020-10-05 18:41:59 2020-10-05 18:50:59
4 5 2020-10-05 19:54:38 2020-10-05 20:28:38
5 23654 2020-10-05 07:38:07 2020-10-05 07:40:07
Thanks for any help.
Upvotes: 0
Views: 104
Reputation: 887108
We can use
library(dplyr)
subset1 <- data %>%
slice(which(hour(start_time) >= 9 & hour(start_time) <= 17))
Upvotes: 1
Reputation: 21400
A regex solution:
library(dplyr)
library(stringr)
data %>%
mutate(h = as.numeric(str_extract(start_time, "(?<= )\\d+"))) %>%
filter(!(h >= 9 & h <= 17))
For the opposite result just remove !
Upvotes: 2
Reputation: 26218
Directly split the data into two groups using group_split
data %>% group_split(Result = between(hour(start_time), 9, 17))
<list_of<
tbl_df<
trip_id : double
start_time: datetime<UTC>
end_time : datetime<UTC>
Result : logical
>
>[2]>
[[1]]
# A tibble: 5 x 4
trip_id start_time end_time Result
<dbl> <dttm> <dttm> <lgl>
1 1 2020-10-05 20:21:42 2020-10-05 21:04:00 FALSE
2 2 2020-10-05 08:28:42 2020-10-05 08:58:42 FALSE
3 3 2020-10-05 18:41:59 2020-10-05 18:50:59 FALSE
4 5 2020-10-05 19:54:38 2020-10-05 20:28:38 FALSE
5 23654 2020-10-05 07:38:07 2020-10-05 07:40:07 FALSE
[[2]]
# A tibble: 5 x 4
trip_id start_time end_time Result
<dbl> <dttm> <dttm> <lgl>
1 4 2020-10-05 11:11:36 2020-10-05 12:12:54 TRUE
2 6 2020-10-05 16:09:16 2020-10-05 20:00:42 TRUE
3 7 2020-10-05 09:16:33 2020-10-05 11:16:27 TRUE
4 23653 2020-10-05 11:33:22 2020-10-05 12:05:06 TRUE
5 23655 2020-10-05 11:33:03 2020-10-05 11:42:02 TRUE
If you want to save these two data splits use list2env
like this
data %>% group_split(Result = !between(hour(start_time), 9, 17)) %>%
setNames(c('Result_false', 'Result_True')) %>%
list2env(envir = .GlobalEnv)
and you'll get two df
s saved in your environment
Upvotes: 3
Reputation: 101335
A data.table
option
> setDT(data)[between(hour(start_time), 9, 17)]
trip_id start_time end_time
1: 4 2020-10-05 11:11:36 2020-10-05 12:12:54
2: 6 2020-10-05 16:09:16 2020-10-05 20:00:42
3: 7 2020-10-05 09:16:33 2020-10-05 11:16:27
4: 23653 2020-10-05 11:33:22 2020-10-05 12:05:06
5: 23655 2020-10-05 11:33:03 2020-10-05 11:42:02
Upvotes: 1
Reputation: 388982
You can subset the data based on hours of start_time
:
library(dplyr)
library(lubridate)
subset1 <- data %>% filter(between(hour(start_time), 9, 17))
subset2 <- data %>% filter(!between(hour(start_time), 9, 17))
In base R :
data <- transform(data, hour = as.integer(format(start_time, '%H')))
subset1 <- subset(data, hour >= 9 & hour <= 17)
subset2 <- subset(data, !(hour >= 9 & hour <= 17))
Upvotes: 3