sherek_66
sherek_66

Reputation: 531

how to chose 2 row of a group whose have the same value in some column and different value in one column?

I have a col1 index of family , col2 index of persons in each family , col3 mode of travel of each person, col4 and col5 start and end time of activity how I can choose some row whose in each family, driver and passenger have the same start time and end time?

example:

family  persons    mode          start time   end time
   1      1        driver           3            8:45
   1      1        walk             8:45         13:30
   1      1        bus              13:30        15
   1      1        driver           15:00        15:30
   1      2        walk             15:00        15:30
   1      2        driver           22:00        8:30
   1      3        passenger        15:00        15:30
   1      3        walk             8:00         17:00
   1      4        bus              17:00        24:00
   1      4        passenger        15:00        15:30
   1      4        walk             23:00        24:00   
   2      1        driver            8:00         10:00   
   2      1        driver            23:00        24:00   
   2      2        passenger        23:00        24:00     

In the first family first and second persons are driver , third and forth persons are passenger, the start and end time of both passenger is match by first driver (15:00 t0 15:30) so I need to keep 3 rows :( 4,7,10)

second family driver and passenger are matched also . so the output is:

family  persons    mode          start time   end time
   1      1        driver           15:00        15:30
   1      3        passenger        15:00        15:30
   1      4        passenger        15:00        15:30  
   2      1        drive            23:00        24:00   
   2      2        passenger        23:00        24:00     

Remark: second person in first family has the same start and end time as passenger but her mode is walk and not driver, so we don't need that row

Upvotes: 0

Views: 47

Answers (1)

akrun
akrun

Reputation: 886938

An option is to group by 'family', 'starttime', 'endtime' and filter the number of rows greater than 1

library(dplyr)
df1 %>% 
   group_by(family, starttime, endtime) %>% 
   filter(n() > 1)
# A tibble: 5 x 5
# Groups:   family, starttime, endtime [2]
#  family persons mode      starttime endtime
#   <int>   <int> <chr>     <chr>     <chr>  
#1      1       1 driver    15:00     15:30  
#2      1       3 passenger 15:00     15:30  
#3      1       4 passenger 15:00     15:30  
#4      2       1 drive     23:00     24:00  
#5      2       2 passenger 23:00     24:00  

If we need to do this only for selected 'mode', use %in% with addditional logical along with the existing one

df1 %>% 
     group_by(family, starttime, endtime) %>%
     filter(n() >1,  mode %in% c("driver", "passenger"))

Or in base R, an option is to create logical vector with duplicated

df1[duplicated(df1[c(1, 4, 5)])|duplicated(df1[c(1, 4, 5)], fromLast = TRUE),]

data

df1 <- structure(list(family = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L), persons = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 
3L, 4L, 4L, 4L, 1L, 1L, 2L), mode = c("driver", "walk", "bus", 
"driver", "driver", "driver", "passenger", "walk", "bus", "passenger", 
"walk", "drive", "drive", "passenger"), starttime = c("3", "8:45", 
"13:30", "15:00", "20:00", "22:00", "15:00", "8:00", "17:00", 
"15:00", "23:00", "8:00", "23:00", "23:00"), endtime = c("8:45", 
"13:30", "15", "15:30", "22:00", "8:30", "15:30", "17:00", "24:00", 
"15:30", "24:00", "10:00", "24:00", "24:00")), 
class = "data.frame", row.names = c(NA, 
-14L))

Upvotes: 2

Related Questions