aaaaa
aaaaa

Reputation: 183

order data.frame based on 2 columns and vector of variables

I need to order a data.frame based on 2 columns and a given vector of variables.

Here n example of my df:

df = data.frame(A = rnorm(45),
                B = rep(c('a', 'b', 'c'), each= 5, times = 3),
                C = rep(c(10, 20, 30), each = 15))

I need to change the order of col B from c('a', 'b', 'c') to c('c', 'a', 'b') while still keeping col C fixed to the 3 variables groups.

Here the first 30 rows of my desired output:

  A          B  C
 -0.11451485 c 10
 -0.11860742 c 10
  0.08156183 c 10
  1.11850750 c 10
 -0.79072556 c 10
  1.24141030 a 10
  0.88538811 a 10
 -1.35548712 a 10
  0.05723677 a 10
  0.14660464 a 10
 -0.28587107 b 10
  0.59452832 b 10
  1.00163605 b 10
  1.15892322 b 10
 -1.41771696 b 10
 -2.05743546 c 20
 -1.22835358 c 20
  1.50060736 c 20
 -0.14956114 c 20
 -1.13126592 c 20
  1.08571256 a 20
 -1.04991699 a 20
 -1.50655996 a 20
 -0.63675392 a 20
 -0.26485423 a 20
  0.30509657 b 20
  0.85471772 b 20
 -0.54064736 b 20
  0.24578056 b 20
  0.14917900 b 20

Any help will be really appreciated, thanks

Upvotes: 0

Views: 437

Answers (2)

onlyphantom
onlyphantom

Reputation: 9583

You can first use factor() to order your B factor by levels you define; With that, you can now order your data frame by B to get your desired output.

Generating some data:

set.seed(10)
df = data.frame(A = rnorm(45),
                B = rep(c('a', 'b', 'c'), each= 5, times = 3),
                C = rep(c(10, 20, 30), each = 15))

And using levels to re-level your factor before ordering the data frame:

df$B <- factor(df$B,levels = c('c', 'a', 'b'))
df$B <- sort(df$B, decreasing = F)
df <- df[order(df$C), ]

Output (first 20 rows):

1.0177950   c   10  
0.75578151  c   10  
-0.23823356 c   10  
0.98744470  c   10  
0.74139013  c   10  
0.01874617  a   10  
-0.18425254 a   10
-1.37133055 a   10  
-0.59916772 a   10  
0.29454513  a   10  
0.38979430  b   10  
-1.20807618 b   10  
-0.36367602 b   10  
-1.62667268 b   10  
-0.25647839 b   10  
-0.37366156 c   20  
-0.68755543 c   20  
-0.87215883 c   20  
-0.10176101 c   20  
-0.25378053 c   20  

Upvotes: 1

www
www

Reputation: 39154

The key is to change the level of the factor column. After that, we can use arrange from the dplyr package to sort multiple columns. Notice that in your original post, sorting column A is not a requirement. I just add column A to the arrange call to show it is easy to include more than two columns to the arrange function.

library(dplyr)

df2 <- df %>%
  # Change the level of the factor
  mutate(B = factor(B, levels = c("c", "a", "b"))) %>%
  # Arrange the column
  arrange(C, B, A)
df2
#              A B  C
# 1  -2.39317699 c 10
# 2  -1.48901928 c 10
# 3  -0.42562766 c 10
# 4   0.03383395 c 10
# 5   0.66362189 c 10
# 6  -0.65324997 a 10
# 7  -0.59408686 a 10
# 8   0.37012883 a 10
# 9   0.53238177 a 10
# 10  3.03972004 a 10
# 11 -2.03192274 b 10
# 12 -1.05138447 b 10
# 13 -0.80795342 b 10
# 14  1.74526091 b 10
# 15  2.07681466 b 10
# 16 -1.90573715 c 20
# 17 -0.72626244 c 20
# 18 -0.48017481 c 20
# 19 -0.42995920 c 20
# 20  0.17729002 c 20
# 21 -0.62947278 a 20
# 22 -0.40038152 a 20
# 23 -0.23368555 a 20
# 24  0.44218806 a 20
# 25  1.58561071 a 20
# 26 -0.66270426 b 20
# 27 -0.50256255 b 20
# 28 -0.19890974 b 20
# 29  0.26562533 b 20
# 30  1.84093124 b 20
# 31 -0.93702848 c 30
# 32  0.10804529 c 30
# 33  0.25758608 c 30
# 34  1.33084399 c 30
# 35  1.67204875 c 30
# 36 -1.88922564 a 30
# 37 -1.74551938 a 30
# 38 -1.32215854 a 30
# 39 -0.43743607 a 30
# 40  1.07554466 a 30
# 41 -0.38154167 b 30
# 42  0.53823057 b 30
# 43  0.83401316 b 30
# 44  1.04418363 b 30
# 45  2.45985490 b 30

Upvotes: 1

Related Questions