I have a dataframe that looks something like this: CityFrom CityTo Count Paris Lyon 2 Lyon Paris 4 London Manchester 5 Manchester London 6 I want to add 2 additional columns to each row that will provide me a city name sorted regardless of the point of destination and origin: So, as an example, Lyon-Paris and Paris-Lyon are the same in this case. My initial idea was: combine the 2 columns i.e. CityFrom and CityTo using paste sort them alphabetically split them in 2 columns. As a use case, the row with Paris as CityFrom and Lyon as CityTo: paste the 2 columns to give - "Paris, Lyon". sort to give the output - "Lyon, Paris" split into 2 columns with Lyon as CityCodeBidirectionalFrom and Paris as CityCodeBidirectionalto. I haven't been able to implement the above logic.

Reputation: 173

Sort values from multiple column and create a new variable

I have a dataframe that looks something like this:

  CityFrom         CityTo   Count
     Paris           Lyon       2
      Lyon          Paris       4
    London     Manchester       5
Manchester         London       6

I want to add 2 additional columns to each row that will provide me a city name sorted regardless of the point of destination and origin:

So, as an example, Lyon-Paris and Paris-Lyon are the same in this case.

My initial idea was:

combine the 2 columns i.e. CityFrom and CityTo using paste
sort them alphabetically
split them in 2 columns.

As a use case, the row with Paris as CityFrom and Lyon as CityTo:

paste the 2 columns to give - "Paris, Lyon".
sort to give the output - "Lyon, Paris"
split into 2 columns with Lyon as CityCodeBidirectionalFrom and Paris as CityCodeBidirectionalto.

I haven't been able to implement the above logic.

Upvotes: 2

Answers (4)

M--

Reputation: 29203

For a tidyverse solution, we can put them into a list, sort them, and then unnest them. Look below;

library(dplyr)
library(tidyr)
library(purrr)


df1 %>% 
  mutate(CityCodeBidirectional = map2(CityFrom, CityTo, ~sort(c(.x , .y)))) %>% 
  unnest_wider(data = ., col = CityCodeBidirectional, 
                         names_sep = c("From", "To"), names_repair = "universal")

#> # A tibble: 4 x 5
#>   CityFrom   CityTo     Count CityCodeBidirectionalF~ CityCodeBidirectiona~
#>   <chr>      <chr>      <int> <chr>                   <chr>                
#> 1 Paris      Lyon           2 Lyon                    Paris                
#> 2 Lyon       Paris          4 Lyon                    Paris                
#> 3 London     Manchester     5 London                  Manchester           
#> 4 Manchester London         6 London                  Manchester

Data:

df1 <- read.table(text="CityFrom    CityTo  Count
                        Paris   Lyon    2
                        Lyon    Paris   4
                        London  Manchester  5
                        Manchester London       6",
                  header = T, stringsAsFactors = F)

Upvotes: 1

lroha

Reputation: 34621

Another option is to use pmin() and pmax().

cbind(df, sapply(c(CityCodeBidirectionalFrom = pmin,
                   CityCodeBidirectionalTo = pmax), do.call, df[1:2]))

    CityFrom     CityTo Count CityCodeBidirectionalFrom CityCodeBidirectionalTo
1      Paris       Lyon     2                      Lyon                   Paris
2       Lyon      Paris     4                      Lyon                   Paris
3     London Manchester     5                    London              Manchester
4 Manchester     London     6                    London              Manchester

Upvotes: 1

BigFinger

Reputation: 1043

You can sort the values without the need for concatenation, like this:

df = data.frame(
    CityFrom = c("Paris", "Lyon", "London", "Manchester"),
    CityTo = c("Lyon", "Paris", "Manchester", "London"),
    Count = c(2, 4, 5, 6),
    stringsAsFactors = FALSE
)

mysort = apply(df[,1:2], 1, sort)
rownames(mysort) = c("CityCodeBidirectionalFrom", "CityCodeBidirectionalTo")
cbind(df, t(mysort), stringsAsFactors=FALSE)
##     CityFrom     CityTo Count CityCodeBidirectionalFrom CityCodeBidirectionalTo
## 1      Paris       Lyon     2                      Lyon                   Paris
## 2       Lyon      Paris     4                      Lyon                   Paris
## 3     London Manchester     5                    London              Manchester
## 4 Manchester     London     6                    London              Manchester

Upvotes: 0

liborm

Reputation: 2734

You don't actually need to 'sort', just compare each pair. And you need to do it twice, flipping the condition to get the other from the pair:

library(tidyverse)

d %>% 
  mutate(
    CityCodeBidirectionalFrom = if_else(CityFrom < CityTo, CityFrom, CityTo),
    CityCodeBidirectionalTo = if_else(CityFrom > CityTo, CityFrom, CityTo))

(as you did not provide any testing data, my code is rather an untested pseudocode than a copy-paste solution...)

Upvotes: 2

Sort values from multiple column and create a new variable

Answers (4)

Related Questions