Any packages or function from R can compute this thing?

Question

I am new from R, please help me. Here are my reproducible example

This is what the sample looks like: 

# A tibble: 10 x 4
   vehicle_id                  location           time     for_hire_light
                                                     
 1 zzJcCfa6nuUF9A02Sud5fASxowM 100.50457_13.90834 05:19:37              0
 2 zzJcCfa6nuUF9A02Sud5fASxowM 100.51297_13.91534 05:21:37              0
 3 zzJcCfa6nuUF9A02Sud5fASxowM 100.51323_13.91548 05:22:37              0
 4 zzJcCfa6nuUF9A02Sud5fASxowM 100.50572_13.90243 05:24:37              0
 5 zzJcCfa6nuUF9A02Sud5fASxowM 100.50717_13.8986  05:25:37              0
 6 zzJcCfa6nuUF9A02Sud5fASxowM 100.50979_13.89154 05:26:37              0
 7 zzJcCfa6nuUF9A02Sud5fASxowM 100.51099_13.88835 05:28:37              0
 8 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.6657_13.90103  22:41:30              1
 9 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.66742_13.90093 22:42:30              1
10 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.66916_13.90055 22:44:30              1

my.df <- structure(list(vehicle_id = c("zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","/+bx80f3gOoPMoFBsS+3xX6jpi8", "/+bx80f3gOoPMoFBsS+3xX6jpi8","/+bx80f3gOoPMoFBsS+3xX6jpi8"), location = c("100.50457_13.90834","100.51297_13.91534", "100.51323_13.91548", "100.50572_13.90243","100.50717_13.8986", "100.50979_13.89154", "100.51099_13.88835","100.6657_13.90103", "100.66742_13.90093", "100.66916_13.90055"), time = c("05:19:37", "05:21:37", "05:22:37", "05:24:37", "05:25:37","05:26:37", "05:28:37", "22:41:30", "22:42:30", "22:44:30"),for_hire_light = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))

In my raw data, it still has some vehicle id that junk. For example, some vehicle has only "0" value in for_hire_light's column and it doesn't make sense. Can I make some code like I ganna remove the row if they have only "0" value (vehicle id doesn't have any 1 value)? So in each vehicle id, at least it has to start with 1 and end with 1.

I am finding ways to achieve my outputs like: I want to create new trips separate from the "for_hire_light" column. For example, 0 = has passenger and 1 = no passenger. My data is probes gps data and it was collected every 1 minute so In each trip, I want to select the first 0 in each vehicle id as the first start location ( and 0 value will continue until the passenger will get out of the car and that vehicle will turn to 1 value again) and the last 0 before it turns to 1 of that vehicle id is the finish location based on time. That's mean in each vehicle id, it can have several trips based on the "for_hire_light" column.

These are what the results' example that I prefer:

Trip ID	start location	starting time	finish location	finishing time
0001	100.50457_13.90834	05:19:37	100.63465_13.74289	23:59:17
0002	100.63233_13.84901	07:48:16	100.63194_13.84868	23:26:42

Is there any function from R that I should apply?

Ronak Shah · Accepted Answer

You can use data.table rleid to get a unique group number for every trip and select first and last location and times.

library(dplyr)

my.df %>%
  group_by(grp = data.table::rleid(for_hire_light)) %>%
  summarise(start_location = first(location), 
            end_location = last(location), 
            start_time = first(time), 
            end_time = last(time), 
            for_hire_light = first(for_hire_light)) %>%
  filter(for_hire_light == 0) %>%
  select(-for_hire_light, -grp) %>%
  mutate(trip_id = row_number(), .before = 1) -> result

result

Any packages or function from R can compute this thing?

Answers (2)

Related Questions