Reputation: 531
I have a col1 index of family , col2 index of persons in each family , col3 mode of travel of each person, col4 and col5 start and end time of activity how I can choose some row whose in each family, driver and passenger have the same start time and end time?
example:
family persons mode start time end time
1 1 driver 3 8:45
1 1 walk 8:45 13:30
1 1 bus 13:30 15
1 1 driver 15:00 15:30
1 2 walk 15:00 15:30
1 2 driver 22:00 8:30
1 3 passenger 15:00 15:30
1 3 walk 8:00 17:00
1 4 bus 17:00 24:00
1 4 passenger 15:00 15:30
1 4 walk 23:00 24:00
2 1 driver 8:00 10:00
2 1 driver 23:00 24:00
2 2 passenger 23:00 24:00
In the first family first and second persons are driver , third and forth persons are passenger, the start and end time of both passenger is match by first driver (15:00 t0 15:30) so I need to keep 3 rows :( 4,7,10)
second family driver and passenger are matched also . so the output is:
family persons mode start time end time
1 1 driver 15:00 15:30
1 3 passenger 15:00 15:30
1 4 passenger 15:00 15:30
2 1 drive 23:00 24:00
2 2 passenger 23:00 24:00
Remark: second person in first family has the same start and end time as passenger but her mode is walk and not driver, so we don't need that row
Upvotes: 0
Views: 47
Reputation: 886938
An option is to group by 'family', 'starttime', 'endtime' and filter
the number of rows greater than 1
library(dplyr)
df1 %>%
group_by(family, starttime, endtime) %>%
filter(n() > 1)
# A tibble: 5 x 5
# Groups: family, starttime, endtime [2]
# family persons mode starttime endtime
# <int> <int> <chr> <chr> <chr>
#1 1 1 driver 15:00 15:30
#2 1 3 passenger 15:00 15:30
#3 1 4 passenger 15:00 15:30
#4 2 1 drive 23:00 24:00
#5 2 2 passenger 23:00 24:00
If we need to do this only for selected 'mode', use %in%
with addditional logical along with the existing one
df1 %>%
group_by(family, starttime, endtime) %>%
filter(n() >1, mode %in% c("driver", "passenger"))
Or in base R
, an option is to create logical vector with duplicated
df1[duplicated(df1[c(1, 4, 5)])|duplicated(df1[c(1, 4, 5)], fromLast = TRUE),]
df1 <- structure(list(family = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L), persons = c(1L, 1L, 1L, 1L, 2L, 2L, 3L,
3L, 4L, 4L, 4L, 1L, 1L, 2L), mode = c("driver", "walk", "bus",
"driver", "driver", "driver", "passenger", "walk", "bus", "passenger",
"walk", "drive", "drive", "passenger"), starttime = c("3", "8:45",
"13:30", "15:00", "20:00", "22:00", "15:00", "8:00", "17:00",
"15:00", "23:00", "8:00", "23:00", "23:00"), endtime = c("8:45",
"13:30", "15", "15:30", "22:00", "8:30", "15:30", "17:00", "24:00",
"15:30", "24:00", "10:00", "24:00", "24:00")),
class = "data.frame", row.names = c(NA,
-14L))
Upvotes: 2