biviz
biviz

Reputation: 173

Sort values from multiple column and create a new variable

I have a dataframe that looks something like this:

  CityFrom         CityTo   Count
     Paris           Lyon       2
      Lyon          Paris       4
    London     Manchester       5
Manchester         London       6

I want to add 2 additional columns to each row that will provide me a city name sorted regardless of the point of destination and origin:

enter image description here

So, as an example, Lyon-Paris and Paris-Lyon are the same in this case.

My initial idea was:

  1. combine the 2 columns i.e. CityFrom and CityTo using paste
  2. sort them alphabetically
  3. split them in 2 columns.

As a use case, the row with Paris as CityFrom and Lyon as CityTo:

  1. paste the 2 columns to give - "Paris, Lyon".

  2. sort to give the output - "Lyon, Paris"

  3. split into 2 columns with Lyon as CityCodeBidirectionalFrom and Paris as CityCodeBidirectionalto.

I haven't been able to implement the above logic.

Upvotes: 2

Views: 203

Answers (4)

M--
M--

Reputation: 29203

For a solution, we can put them into a list, sort them, and then unnest them. Look below;

library(dplyr)
library(tidyr)
library(purrr)


df1 %>% 
  mutate(CityCodeBidirectional = map2(CityFrom, CityTo, ~sort(c(.x , .y)))) %>% 
  unnest_wider(data = ., col = CityCodeBidirectional, 
                         names_sep = c("From", "To"), names_repair = "universal")

#> # A tibble: 4 x 5
#>   CityFrom   CityTo     Count CityCodeBidirectionalF~ CityCodeBidirectiona~
#>   <chr>      <chr>      <int> <chr>                   <chr>                
#> 1 Paris      Lyon           2 Lyon                    Paris                
#> 2 Lyon       Paris          4 Lyon                    Paris                
#> 3 London     Manchester     5 London                  Manchester           
#> 4 Manchester London         6 London                  Manchester

Data:

df1 <- read.table(text="CityFrom    CityTo  Count
                        Paris   Lyon    2
                        Lyon    Paris   4
                        London  Manchester  5
                        Manchester London       6",
                  header = T, stringsAsFactors = F)

Upvotes: 1

lroha
lroha

Reputation: 34621

Another option is to use pmin() and pmax().

cbind(df, sapply(c(CityCodeBidirectionalFrom = pmin,
                   CityCodeBidirectionalTo = pmax), do.call, df[1:2]))

    CityFrom     CityTo Count CityCodeBidirectionalFrom CityCodeBidirectionalTo
1      Paris       Lyon     2                      Lyon                   Paris
2       Lyon      Paris     4                      Lyon                   Paris
3     London Manchester     5                    London              Manchester
4 Manchester     London     6                    London              Manchester

Upvotes: 1

BigFinger
BigFinger

Reputation: 1043

You can sort the values without the need for concatenation, like this:

df = data.frame(
    CityFrom = c("Paris", "Lyon", "London", "Manchester"),
    CityTo = c("Lyon", "Paris", "Manchester", "London"),
    Count = c(2, 4, 5, 6),
    stringsAsFactors = FALSE
)

mysort = apply(df[,1:2], 1, sort)
rownames(mysort) = c("CityCodeBidirectionalFrom", "CityCodeBidirectionalTo")
cbind(df, t(mysort), stringsAsFactors=FALSE)
##     CityFrom     CityTo Count CityCodeBidirectionalFrom CityCodeBidirectionalTo
## 1      Paris       Lyon     2                      Lyon                   Paris
## 2       Lyon      Paris     4                      Lyon                   Paris
## 3     London Manchester     5                    London              Manchester
## 4 Manchester     London     6                    London              Manchester

Upvotes: 0

liborm
liborm

Reputation: 2734

You don't actually need to 'sort', just compare each pair. And you need to do it twice, flipping the condition to get the other from the pair:

library(tidyverse)

d %>% 
  mutate(
    CityCodeBidirectionalFrom = if_else(CityFrom < CityTo, CityFrom, CityTo),
    CityCodeBidirectionalTo = if_else(CityFrom > CityTo, CityFrom, CityTo))

(as you did not provide any testing data, my code is rather an untested pseudocode than a copy-paste solution...)

Upvotes: 2

Related Questions