Reputation: 173
I have a dataframe that looks something like this:
CityFrom CityTo Count
Paris Lyon 2
Lyon Paris 4
London Manchester 5
Manchester London 6
I want to add 2 additional columns to each row that will provide me a city name sorted regardless of the point of destination and origin:
So, as an example, Lyon-Paris and Paris-Lyon are the same in this case.
My initial idea was:
paste
sort
them alphabeticallysplit
them in 2 columns. As a use case, the row with Paris as CityFrom and Lyon as CityTo:
paste
the 2 columns to give - "Paris, Lyon".
sort
to give the output - "Lyon, Paris"
split
into 2 columns with Lyon as CityCodeBidirectionalFrom and Paris as CityCodeBidirectionalto.
I haven't been able to implement the above logic.
Upvotes: 2
Views: 203
Reputation: 29203
For a tidyverse solution, we can put them into a list, sort them, and then unnest them. Look below;
library(dplyr)
library(tidyr)
library(purrr)
df1 %>%
mutate(CityCodeBidirectional = map2(CityFrom, CityTo, ~sort(c(.x , .y)))) %>%
unnest_wider(data = ., col = CityCodeBidirectional,
names_sep = c("From", "To"), names_repair = "universal")
#> # A tibble: 4 x 5
#> CityFrom CityTo Count CityCodeBidirectionalF~ CityCodeBidirectiona~
#> <chr> <chr> <int> <chr> <chr>
#> 1 Paris Lyon 2 Lyon Paris
#> 2 Lyon Paris 4 Lyon Paris
#> 3 London Manchester 5 London Manchester
#> 4 Manchester London 6 London Manchester
Data:
df1 <- read.table(text="CityFrom CityTo Count
Paris Lyon 2
Lyon Paris 4
London Manchester 5
Manchester London 6",
header = T, stringsAsFactors = F)
Upvotes: 1
Reputation: 34621
Another option is to use pmin()
and pmax()
.
cbind(df, sapply(c(CityCodeBidirectionalFrom = pmin,
CityCodeBidirectionalTo = pmax), do.call, df[1:2]))
CityFrom CityTo Count CityCodeBidirectionalFrom CityCodeBidirectionalTo
1 Paris Lyon 2 Lyon Paris
2 Lyon Paris 4 Lyon Paris
3 London Manchester 5 London Manchester
4 Manchester London 6 London Manchester
Upvotes: 1
Reputation: 1043
You can sort the values without the need for concatenation, like this:
df = data.frame(
CityFrom = c("Paris", "Lyon", "London", "Manchester"),
CityTo = c("Lyon", "Paris", "Manchester", "London"),
Count = c(2, 4, 5, 6),
stringsAsFactors = FALSE
)
mysort = apply(df[,1:2], 1, sort)
rownames(mysort) = c("CityCodeBidirectionalFrom", "CityCodeBidirectionalTo")
cbind(df, t(mysort), stringsAsFactors=FALSE)
## CityFrom CityTo Count CityCodeBidirectionalFrom CityCodeBidirectionalTo
## 1 Paris Lyon 2 Lyon Paris
## 2 Lyon Paris 4 Lyon Paris
## 3 London Manchester 5 London Manchester
## 4 Manchester London 6 London Manchester
Upvotes: 0
Reputation: 2734
You don't actually need to 'sort', just compare each pair. And you need to do it twice, flipping the condition to get the other from the pair:
library(tidyverse)
d %>%
mutate(
CityCodeBidirectionalFrom = if_else(CityFrom < CityTo, CityFrom, CityTo),
CityCodeBidirectionalTo = if_else(CityFrom > CityTo, CityFrom, CityTo))
(as you did not provide any testing data, my code is rather an untested pseudocode than a copy-paste solution...)
Upvotes: 2