Yasumin
Yasumin

Reputation: 453

How to select a subset of rows in R

This is what the sample looks like:

# A tibble: 10 x 3
   trip_id start_time          end_time           
     <dbl> <dttm>              <dttm>             
 1       1 2020-10-05 20:21:42 2020-10-05 21:04:00
 2       2 2020-10-05 08:28:42 2020-10-05 08:58:42
 3       3 2020-10-05 18:41:59 2020-10-05 18:50:59
 4       4 2020-10-05 11:11:36 2020-10-05 12:12:54
 5       5 2020-10-05 19:54:38 2020-10-05 20:28:38
 6       6 2020-10-05 16:09:16 2020-10-05 20:00:42
 7       7 2020-10-05 09:16:33 2020-10-05 11:16:27
 8   23653 2020-10-05 11:33:22 2020-10-05 12:05:06
 9   23654 2020-10-05 07:38:07 2020-10-05 07:40:07
10   23655 2020-10-05 11:33:03 2020-10-05 11:42:02
data <- structure(list(trip_id = c(1, 2, 3, 4, 5, 6, 7, 23653, 23654,23655), start_time = structure(c(1601929302, 1601886522, 1601923319,1601896296, 1601927678, 1601914156, 1601889393, 1601897602, 1601883487,1601897583), tzone = "UTC", class = c("POSIXct", "POSIXt")),end_time = structure(c(1601931840, 1601888322, 1601923859,1601899974, 1601929718, 1601928042, 1601896587, 1601899506,1601883607, 1601898122), tzone = "UTC", class = c("POSIXct","POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame")) 

I want to separate row by the different time (start_time and end_time columns) into two new results.

(1)The first result is the time between 09.00-18.00

(2)Second is anytime except 09.00-18.00

However, if start_time and end_time lead to a different result, then the start_time column should be the gain for that row.

So, my results should be 2 results separately. For example,

Result (1)

# A tibble: 10 x 3
   trip_id start_time          end_time           
     <dbl> <dttm>              <dttm>             
 1       4 2020-10-05 11:11:36 2020-10-05 12:12:54
 2       6 2020-10-05 16:09:16 2020-10-05 20:00:42
 3       7 2020-10-05 09:16:33 2020-10-05 11:16:27
 4   23653 2020-10-05 11:33:22 2020-10-05 12:05:06
 5   23655 2020-10-05 11:33:03 2020-10-05 11:42:02

Result (2)

# A tibble: 10 x 3
   trip_id start_time          end_time           
     <dbl> <dttm>              <dttm>             
 1       1 2020-10-05 20:21:42 2020-10-05 21:04:00
 2       2 2020-10-05 08:28:42 2020-10-05 08:58:42
 3       3 2020-10-05 18:41:59 2020-10-05 18:50:59
 4       5 2020-10-05 19:54:38 2020-10-05 20:28:38
 5   23654 2020-10-05 07:38:07 2020-10-05 07:40:07

Thanks for any help.

Upvotes: 0

Views: 104

Answers (5)

akrun
akrun

Reputation: 887108

We can use

library(dplyr)
subset1 <- data %>%
               slice(which(hour(start_time) >= 9 & hour(start_time) <= 17))

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

A regex solution:

library(dplyr)
library(stringr)
data %>%
  mutate(h = as.numeric(str_extract(start_time, "(?<= )\\d+"))) %>%
  filter(!(h >= 9 & h <= 17))

For the opposite result just remove !

Upvotes: 2

AnilGoyal
AnilGoyal

Reputation: 26218

Directly split the data into two groups using group_split

data %>% group_split(Result = between(hour(start_time), 9, 17))

<list_of<
  tbl_df<
    trip_id   : double
    start_time: datetime<UTC>
    end_time  : datetime<UTC>
    Result    : logical
  >
>[2]>
[[1]]
# A tibble: 5 x 4
  trip_id start_time          end_time            Result
    <dbl> <dttm>              <dttm>              <lgl> 
1       1 2020-10-05 20:21:42 2020-10-05 21:04:00 FALSE 
2       2 2020-10-05 08:28:42 2020-10-05 08:58:42 FALSE 
3       3 2020-10-05 18:41:59 2020-10-05 18:50:59 FALSE 
4       5 2020-10-05 19:54:38 2020-10-05 20:28:38 FALSE 
5   23654 2020-10-05 07:38:07 2020-10-05 07:40:07 FALSE 

[[2]]
# A tibble: 5 x 4
  trip_id start_time          end_time            Result
    <dbl> <dttm>              <dttm>              <lgl> 
1       4 2020-10-05 11:11:36 2020-10-05 12:12:54 TRUE  
2       6 2020-10-05 16:09:16 2020-10-05 20:00:42 TRUE  
3       7 2020-10-05 09:16:33 2020-10-05 11:16:27 TRUE  
4   23653 2020-10-05 11:33:22 2020-10-05 12:05:06 TRUE  
5   23655 2020-10-05 11:33:03 2020-10-05 11:42:02 TRUE 

If you want to save these two data splits use list2env like this

data %>% group_split(Result = !between(hour(start_time), 9, 17)) %>%
  setNames(c('Result_false', 'Result_True')) %>%
  list2env(envir = .GlobalEnv)

and you'll get two dfs saved in your environment

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

A data.table option

> setDT(data)[between(hour(start_time), 9, 17)]
   trip_id          start_time            end_time
1:       4 2020-10-05 11:11:36 2020-10-05 12:12:54
2:       6 2020-10-05 16:09:16 2020-10-05 20:00:42
3:       7 2020-10-05 09:16:33 2020-10-05 11:16:27
4:   23653 2020-10-05 11:33:22 2020-10-05 12:05:06
5:   23655 2020-10-05 11:33:03 2020-10-05 11:42:02

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can subset the data based on hours of start_time :

library(dplyr)
library(lubridate)

subset1 <- data %>% filter(between(hour(start_time), 9, 17))
subset2 <- data %>% filter(!between(hour(start_time), 9, 17))

In base R :

data <- transform(data, hour = as.integer(format(start_time, '%H')))
subset1 <- subset(data, hour >= 9 & hour <= 17)
subset2 <- subset(data, !(hour >= 9 & hour <= 17))

Upvotes: 3

Related Questions