Reputation: 57
I am working with a transportation dataset that records departure and the arriving locations of each trip. From that, I can just use paste(arrival_to,departure_from,sep = "-")
to create a route, A-B
for example.
However, I also want to group round-trips as one. For example, both "From A to B" and from "From B to A" should all give A-B
.
The dataset looks like this:
df <- data.frame(id = c(1,1,1,2,3),
departure_from = c("A","B","A","B","C"),
arrival_to = c("A","A","B","A","A"))
id departure_from arrival_to
1 1 A A
2 1 B A
3 1 A B
4 2 B A
5 3 C A
What I want is this:
id departure_from arrival_to route
1 1 A A A-A
2 1 B A A-B
3 1 A B A-B
4 2 B A A-B
5 3 C A A-C
What I am doing for now is to "exploit" a fact from my dataset that a pair route, that is "A-B" and "B-A", usually have similar observation counts so I did a lenghtly summarize and arrange the count and use lag
to match the with the previous line... This is prone to flaw anyway so I look forward to a more code-focused solution ...
Thank you!
Upvotes: 0
Views: 53
Reputation: 101247
You can try pmin
and pmax
like below
transform(
df,
route = paste0(pmin(departure_from,arrival_to),"-",pmax(departure_from,arrival_to))
)
which gives
id departure_from arrival_to route
1 1 A A A-A
2 1 B A A-B
3 1 A B A-B
4 2 B A A-B
5 3 C A A-C
Upvotes: 2
Reputation: 6206
How about this solution, sorting the pairs beforehand?
library(dplyr)
df %>%
rowwise() %>%
mutate(route = paste(sort(c(departure_from, arrival_to)), collapse="-"))
id departure_from arrival_to route
<dbl> <chr> <chr> <chr>
1 1 A A A-A
2 1 B A A-B
3 1 A B A-B
4 2 B A A-B
5 3 C A A-C
Upvotes: 1