Conditional sorting of column in R

Question

Below is a data frame of four columns, the requirement is that for all common "a1" and "a2" column value pairs, I want to study the corresponding ascending order of the a3 columns and set the corresponding column a4 values in ascending order. E.g. See the value "A" and "M" in "a1" and "a2", if you check the order of corresponding a3 values for "A" and "M" i.e.(10,32,13), I want the corresponding a4 values to be arranged in the order (5,55,23) Similarly for all similar pairs. Thanks and please help.

 a1 = c("A", "B", "C",  "A",  "B",  "B",  "A", "C",  "A", "C", "B")
 a2 = c("M", "O", "N" , "N",  "O",  "N",  "M", "M",  "M", "N", "O")
 a3 = c(10,  34,   35 ,  56 ,  44 ,  37 , 32 ,  15 , 13 ,  12, 37)
 a4 = c(55,  6,    3,    213,  478,  233, 23,   14,  5,    7,  74)
 a123 = data.frame(a1, a2, a3, a4)

AntoniosK · Accepted Answer

Here's a dplyr solution using the rank/order of a3 values within each a1,a2 combination:

a1 = c("A", "B", "C",  "A",  "B",  "B",  "A", "C",  "A", "C", "B")
a2 = c("M", "O", "N" , "N",  "O",  "N",  "M", "M",  "M", "N", "O")
a3 = c(10,  34,   35 ,  56 ,  44 ,  37 , 32 ,  15 , 13 ,  12, 37)
a4 = c(55,  6,    3,    213,  478,  233, 23,   14,  5,    7,  74)

# datasets
a123 = data.frame(a1, a2, a3)
a124 = data.frame(a1, a2, a4)

library(dplyr)

# add the rank to each dataset
# based on the order of the values in a3 and a4
a123 = a123 %>%
  group_by(a1,a2) %>%
  mutate(r = rank(a3)) %>%
  ungroup()

a124 = a124 %>%
  group_by(a1,a2) %>%
  mutate(r = rank(a4)) %>%
  ungroup()

# join datasets based on a1, a2 combinations and the rank
inner_join(a123, a124, by=c("a1","a2","r"))

# # A tibble: 11 x 5
#   a1     a2        a3     r     a4
#        
# 1 A      M       10.0  1.00   5.00
# 2 B      O       34.0  1.00   6.00
# 3 C      N       35.0  2.00   7.00
# 4 A      N       56.0  1.00 213   
# 5 B      O       44.0  3.00 478   
# 6 B      N       37.0  1.00 233   
# 7 A      M       32.0  3.00  55.0 
# 8 C      M       15.0  1.00  14.0 
# 9 A      M       13.0  2.00  23.0 
# 10 C     N       12.0  1.00   3.00
# 11 B     O       37.0  2.00  74.0

You can remove column r if you want. I left it there so you can easily see how it works.

Based on the above philosophy/approach you can also use purrr and dplyr like this:

a1 = c("A", "B", "C",  "A",  "B",  "B",  "A", "C",  "A", "C", "B")
a2 = c("M", "O", "N" , "N",  "O",  "N",  "M", "M",  "M", "N", "O")
a3 = c(10,  34,   35 ,  56 ,  44 ,  37 , 32 ,  15 , 13 ,  12, 37)
a4 = c(55,  6,    3,    213,  478,  233, 23,   14,  5,    7,  74)

# datasets
a123 = data.frame(a1, a2, v=a3)
a124 = data.frame(a1, a2, v=a4)

library(dplyr)
library(purrr)

list(a123, a124) %>%
  map(. %>%
      group_by(a1,a2) %>%
      mutate(r = rank(v)) %>%
      ungroup()) %>%
  reduce(inner_join, by=c("a1","a2","r"))

# # A tibble: 11 x 5
#   a1     a2       v.x     r    v.y
#        
# 1 A      M       10.0  1.00   5.00
# 2 B      O       34.0  1.00   6.00
# 3 C      N       35.0  2.00   7.00
# 4 A      N       56.0  1.00 213   
# 5 B      O       44.0  3.00 478   
# 6 B      N       37.0  1.00 233   
# 7 A      M       32.0  3.00  55.0 
# 8 C      M       15.0  1.00  14.0 
# 9 A      M       13.0  2.00  23.0 
# 10 C      N       12.0  1.00   3.00
# 11 B      O       37.0  2.00  74.0

Conditional sorting of column in R

Answers (2)

Related Questions