Reputation: 453
I am new from R, please help me. Here are my reproducible example
This is what the sample looks like:
# A tibble: 10 x 4
vehicle_id location time for_hire_light
<chr> <chr> <chr> <dbl>
1 zzJcCfa6nuUF9A02Sud5fASxowM 100.50457_13.90834 05:19:37 0
2 zzJcCfa6nuUF9A02Sud5fASxowM 100.51297_13.91534 05:21:37 0
3 zzJcCfa6nuUF9A02Sud5fASxowM 100.51323_13.91548 05:22:37 0
4 zzJcCfa6nuUF9A02Sud5fASxowM 100.50572_13.90243 05:24:37 0
5 zzJcCfa6nuUF9A02Sud5fASxowM 100.50717_13.8986 05:25:37 0
6 zzJcCfa6nuUF9A02Sud5fASxowM 100.50979_13.89154 05:26:37 0
7 zzJcCfa6nuUF9A02Sud5fASxowM 100.51099_13.88835 05:28:37 0
8 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.6657_13.90103 22:41:30 1
9 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.66742_13.90093 22:42:30 1
10 /+bx80f3gOoPMoFBsS+3xX6jpi8 100.66916_13.90055 22:44:30 1
my.df <- structure(list(vehicle_id = c("zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","zzJcCfa6nuUF9A02Sud5fASxowM", "zzJcCfa6nuUF9A02Sud5fASxowM","/+bx80f3gOoPMoFBsS+3xX6jpi8", "/+bx80f3gOoPMoFBsS+3xX6jpi8","/+bx80f3gOoPMoFBsS+3xX6jpi8"), location = c("100.50457_13.90834","100.51297_13.91534", "100.51323_13.91548", "100.50572_13.90243","100.50717_13.8986", "100.50979_13.89154", "100.51099_13.88835","100.6657_13.90103", "100.66742_13.90093", "100.66916_13.90055"), time = c("05:19:37", "05:21:37", "05:22:37", "05:24:37", "05:25:37","05:26:37", "05:28:37", "22:41:30", "22:42:30", "22:44:30"),for_hire_light = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
In my raw data, it still has some vehicle id that junk. For example, some vehicle has only "0" value in for_hire_light's column and it doesn't make sense. Can I make some code like I ganna remove the row if they have only "0" value (vehicle id doesn't have any 1 value)? So in each vehicle id, at least it has to start with 1 and end with 1.
I am finding ways to achieve my outputs like: I want to create new trips separate from the "for_hire_light" column. For example, 0 = has passenger and 1 = no passenger. My data is probes gps data and it was collected every 1 minute so In each trip, I want to select the first 0 in each vehicle id as the first start location ( and 0 value will continue until the passenger will get out of the car and that vehicle will turn to 1 value again) and the last 0 before it turns to 1 of that vehicle id is the finish location based on time. That's mean in each vehicle id, it can have several trips based on the "for_hire_light" column.
These are what the results' example that I prefer:
Trip ID | start location | starting time | finish location | finishing time |
---|---|---|---|---|
0001 | 100.50457_13.90834 | 05:19:37 | 100.63465_13.74289 | 23:59:17 |
0002 | 100.63233_13.84901 | 07:48:16 | 100.63194_13.84868 | 23:26:42 |
Is there any function from R that I should apply?
Upvotes: 0
Views: 81
Reputation: 389055
You can use data.table
rleid
to get a unique group number for every trip and select first and last location and times.
library(dplyr)
my.df %>%
group_by(grp = data.table::rleid(for_hire_light)) %>%
summarise(start_location = first(location),
end_location = last(location),
start_time = first(time),
end_time = last(time),
for_hire_light = first(for_hire_light)) %>%
filter(for_hire_light == 0) %>%
select(-for_hire_light, -grp) %>%
mutate(trip_id = row_number(), .before = 1) -> result
result
Upvotes: 2
Reputation: 1090
Not tested as I don't have R acessable right now, but should work along these lines:
Find first and last times:
library(tidyverse)
starting_positions <- table %>% filter(for_hire_light == 1) %>% group_by(vehicle_id) %>% top_n(1, `time`) %>% mutate(type="start")
finishing_positions <- table %>% group_by(vehicle_id) %>% top_n(-1, `time`) %>% mutate(type="end")
# join both tables and pivot to have start and finish times in the same line
starting_positions %>% full_join(finishing_positions, by="vehicle_id") %>% pivot_wider(id_cols=vehicle_id, names_from=type)
Upvotes: 2