user8853874
user8853874

Reputation:

finding overlapping time in a group

In each household I want to define an indicator whose determined if a driver is available for a passenger or not. the driver is available if his/her trip start at most 1 hour after passengers's trip.

example:

      household    person     mode       start
           1         1         car        7:20
           1         1         car        8:00
           1         1         car        8:30
           1         2       non-car      7:30
           1         3       non-car      7:15
           1         4         car        7:00
           2         1          car       7:00
           2         2        non-car      9:00

in the first household driver can give a ride since his trip is 30 minutes after than second person also he can give a ride to third person . in the second household no.

output

      household    person     mode       start      indicator
           1         1         car        8:00          1 
           1         2       non-car      7:30          1
           1         3       non-car      7:15          1
           2         1          car       7:00          0
           2         2        non-car      9:00         0

then I want to put these matched rows (indicator is 1) next to each other

output

      household    person     mode       start      indicator   household    person     mode       start      indicator
           1         1         car        8:00          1       2         2       non-car      7:30          1
           1         1         car        8:00          1       3         2       non-car      7:15          1

Upvotes: 3

Views: 70

Answers (1)

akrun
akrun

Reputation: 886938

We convert the 'start' to datetime class with as.POSIXct, grouped by 'household', check whether the differnece of 'start' is less than or equal to 1, coerce the logical to binary with as.integer

library(dplyr)
df1 %>% 
 mutate(start = as.POSIXct(start, format = '%H:%M')) %>% 
 group_by(household) %>% 
 mutate(indicator = as.integer(any(diff(start) <= 1)))
# A tibble: 4 x 5
# Groups:   household [2]
#  household person mode    start               indicator
#      <int>  <int> <chr>   <dttm>                  <int>
#1         1      1 car     2019-09-03 08:00:00         1
#2         1      2 non-car 2019-09-03 07:30:00         1
#3         2      1 car     2019-09-03 07:00:00         0
#4         2      2 non-car 2019-09-03 09:00:00         0

To get the second output, we can make use of the pivot_wider from the devel version of tidyr

df1 %>% 
  mutate(startn = as.POSIXct(start, format = '%H:%M')) %>% 
  group_by(household) %>% 
  mutate(indicator = as.integer(any(diff(startn) <= 1))) %>%  
  filter(indicator == 1) %>% 
  select(-startn) %>% 
  group_by(household) %>% 
  mutate(n = row_number()) %>%
  pivot_wider(names_from = n, values_from = c(household, person, mode, start, indicator))
# A tibble: 1 x 10
#  household_1 household_2 person_1 person_2 mode_1 mode_2  start_1 start_2 indicator_1 indicator_2
#        <int>       <int>    <int>    <int> <chr>  <chr>   <chr>   <chr>         <int>       <int>
#1           1           1        1        2 car    non-car 8:00    7:30              1           1

data

df1 <- structure(list(household = c(1L, 1L, 2L, 2L), person = c(1L, 
2L, 1L, 2L), mode = c("car", "non-car", "car", "non-car"), start = c("8:00", 
"7:30", "7:00", "9:00")), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 1

Related Questions