Reputation: 21
Hi Guys I would like to assign an ID to the following dataframe. The Bus has to be the same if the arrival station matches departure station and if arrival time matches with the departure time. Does anyone has an idea how to solve this issue? Thanks in advance!
I would like to have the following:
Upvotes: 0
Views: 52
Reputation: 8699
One approach is to create a graph object where the composite key (station, timestamp) represents a graph vertex, and each route represents an edge. From this graph, each connected component then represents a unique route, so in your example you have the two components:
Component 1: (Station1 10:10) -> (Station2 10:15) -> (Station3 10:18) -> (Station4 10:20)
Component 2: (Station10 10:12) -> (Station 10:25)
Using the igraph
and Tidyverse packages (here dplyr
, magrittr
, and tibble
), this approach could be implemented like so:
# df is source data. create a composite key for
# arrival and departure by concatenating station
# name and timestamp
df %<>% mutate(arrkey = paste0(From, Departure),
depkey = paste0(To, Arrival));
# create graph, identify clusters, and convert clusters to data frame
components <- graph_from_data_frame(df %>% select(arrkey, depkey)) %>%
components() %>%
`$`('membership') %>%
as.data.frame() %>%
tibble::rownames_to_column() %T>%
{names(.) <- c('vertexkey','component')}
# join components with original data frame to produce output
df %>% inner_join(components, by=c('arrkey'='vertexkey')) %>%
select(ID=component, everything()) %>%
select(-arrkey, -depkey) %>%
arrange(ID, Departure)
This produces the desired output:
ID From To Departure Arrival
1 1 Station1 Station2 10:10 10:15
2 1 Station2 Station3 10:15 10:18
3 1 Station3 Station4 10:18 10:20
4 2 Station10 Station15 10:12 10:25
Note: I used the following code to produce df
(removing the date from departure/arrival for simplicity):
df <- data.frame(
From=c('Station1', 'Station10', 'Station2', 'Station3'),
To=c('Station2', 'Station15', 'Station3', 'Station4'),
Departure = c('10:10','10:12','10:15','10:18'),
Arrival = c('10:15','10:25','10:18','10:20'));
Upvotes: 1