Yasumin
Yasumin

Reputation: 453

Any packages or function from R can compute this thing?

I am new from R, please help me. Here are my reproducible example

This is what the sample looks like: 

# A tibble: 10 x 4
   vehicle_id                  location           time     for_hire_light
   <chr>                       <chr>              <chr>             <dbl>
 1 zzJcCfa6nuUF9A02Sud5fASxowM 100.50457_13.90834 05:19:37              0
 2 zzJcCfa6nuUF9A02Sud5fASxowM 100.51297_13.91534 05:21:37              0
 3 zzJcCfa6nuUF9A02Sud5fASxowM 100.51323_13.91548 05:22:37              0
 4 zzJcCfa6nuUF9A02Sud5fASxowM 100.50572_13.90243 05:24:37              0
 5 zzJcCfa6nuUF9A02Sud5fASxowM 100.50717_13.8986  05:25:37              0
 6 zzJcCfa6nuUF9A02Sud5fASxowM 100.50979_13.89154 05:26:37              0
 7 zzJcCfa6nuUF9A02Sud5fASxowM 100.51099_13.88835 05:28:37              0
 8 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.6657_13.90103  22:41:30              1
 9 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.66742_13.90093 22:42:30              1
10 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.66916_13.90055 22:44:30              1
my.df <- structure(list(vehicle_id = c("zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","/+bx80f3gOoPMoFBsS+3xX6jpi8", "/+bx80f3gOoPMoFBsS+3xX6jpi8","/+bx80f3gOoPMoFBsS+3xX6jpi8"), location = c("100.50457_13.90834","100.51297_13.91534", "100.51323_13.91548", "100.50572_13.90243","100.50717_13.8986", "100.50979_13.89154", "100.51099_13.88835","100.6657_13.90103", "100.66742_13.90093", "100.66916_13.90055"), time = c("05:19:37", "05:21:37", "05:22:37", "05:24:37", "05:25:37","05:26:37", "05:28:37", "22:41:30", "22:42:30", "22:44:30"),for_hire_light = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))  

In my raw data, it still has some vehicle id that junk. For example, some vehicle has only "0" value in for_hire_light's column and it doesn't make sense. Can I make some code like I ganna remove the row if they have only "0" value (vehicle id doesn't have any 1 value)? So in each vehicle id, at least it has to start with 1 and end with 1.

I am finding ways to achieve my outputs like: I want to create new trips separate from the "for_hire_light" column. For example, 0 = has passenger and 1 = no passenger. My data is probes gps data and it was collected every 1 minute so In each trip, I want to select the first 0 in each vehicle id as the first start location ( and 0 value will continue until the passenger will get out of the car and that vehicle will turn to 1 value again) and the last 0 before it turns to 1 of that vehicle id is the finish location based on time. That's mean in each vehicle id, it can have several trips based on the "for_hire_light" column.

These are what the results' example that I prefer:

Trip ID start location starting time finish location finishing time
0001 100.50457_13.90834 05:19:37 100.63465_13.74289 23:59:17
0002 100.63233_13.84901 07:48:16 100.63194_13.84868 23:26:42

Is there any function from R that I should apply?

Upvotes: 0

Views: 81

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389055

You can use data.table rleid to get a unique group number for every trip and select first and last location and times.

library(dplyr)

my.df %>%
  group_by(grp = data.table::rleid(for_hire_light)) %>%
  summarise(start_location = first(location), 
            end_location = last(location), 
            start_time = first(time), 
            end_time = last(time), 
            for_hire_light = first(for_hire_light)) %>%
  filter(for_hire_light == 0) %>%
  select(-for_hire_light, -grp) %>%
  mutate(trip_id = row_number(), .before = 1) -> result

result

Upvotes: 2

c0bra
c0bra

Reputation: 1090

Not tested as I don't have R acessable right now, but should work along these lines:

Find first and last times:

library(tidyverse) 
starting_positions <- table %>% filter(for_hire_light == 1) %>% group_by(vehicle_id) %>% top_n(1, `time`) %>% mutate(type="start")
finishing_positions <- table %>% group_by(vehicle_id) %>% top_n(-1, `time`) %>% mutate(type="end")
# join both tables and pivot to have start and finish times in the same line
starting_positions %>% full_join(finishing_positions, by="vehicle_id") %>% pivot_wider(id_cols=vehicle_id, names_from=type)

Upvotes: 2

Related Questions