Reputation: 183
I need to order a data.frame
based on 2 columns and a given vector of variables.
Here n example of my df:
df = data.frame(A = rnorm(45),
B = rep(c('a', 'b', 'c'), each= 5, times = 3),
C = rep(c(10, 20, 30), each = 15))
I need to change the order of col B
from c('a', 'b', 'c')
to c('c', 'a', 'b')
while still keeping col C
fixed to the 3 variables groups.
Here the first 30 rows of my desired output:
A B C
-0.11451485 c 10
-0.11860742 c 10
0.08156183 c 10
1.11850750 c 10
-0.79072556 c 10
1.24141030 a 10
0.88538811 a 10
-1.35548712 a 10
0.05723677 a 10
0.14660464 a 10
-0.28587107 b 10
0.59452832 b 10
1.00163605 b 10
1.15892322 b 10
-1.41771696 b 10
-2.05743546 c 20
-1.22835358 c 20
1.50060736 c 20
-0.14956114 c 20
-1.13126592 c 20
1.08571256 a 20
-1.04991699 a 20
-1.50655996 a 20
-0.63675392 a 20
-0.26485423 a 20
0.30509657 b 20
0.85471772 b 20
-0.54064736 b 20
0.24578056 b 20
0.14917900 b 20
Any help will be really appreciated, thanks
Upvotes: 0
Views: 437
Reputation: 9583
You can first use factor()
to order your B
factor by levels you define; With that, you can now order your data frame by B
to get your desired output.
Generating some data:
set.seed(10)
df = data.frame(A = rnorm(45),
B = rep(c('a', 'b', 'c'), each= 5, times = 3),
C = rep(c(10, 20, 30), each = 15))
And using levels
to re-level your factor before ordering the data frame:
df$B <- factor(df$B,levels = c('c', 'a', 'b'))
df$B <- sort(df$B, decreasing = F)
df <- df[order(df$C), ]
Output (first 20 rows):
1.0177950 c 10
0.75578151 c 10
-0.23823356 c 10
0.98744470 c 10
0.74139013 c 10
0.01874617 a 10
-0.18425254 a 10
-1.37133055 a 10
-0.59916772 a 10
0.29454513 a 10
0.38979430 b 10
-1.20807618 b 10
-0.36367602 b 10
-1.62667268 b 10
-0.25647839 b 10
-0.37366156 c 20
-0.68755543 c 20
-0.87215883 c 20
-0.10176101 c 20
-0.25378053 c 20
Upvotes: 1
Reputation: 39154
The key is to change the level of the factor column. After that, we can use arrange
from the dplyr
package to sort multiple columns. Notice that in your original post, sorting column A is not a requirement. I just add column A to the arrange
call to show it is easy to include more than two columns to the arrange
function.
library(dplyr)
df2 <- df %>%
# Change the level of the factor
mutate(B = factor(B, levels = c("c", "a", "b"))) %>%
# Arrange the column
arrange(C, B, A)
df2
# A B C
# 1 -2.39317699 c 10
# 2 -1.48901928 c 10
# 3 -0.42562766 c 10
# 4 0.03383395 c 10
# 5 0.66362189 c 10
# 6 -0.65324997 a 10
# 7 -0.59408686 a 10
# 8 0.37012883 a 10
# 9 0.53238177 a 10
# 10 3.03972004 a 10
# 11 -2.03192274 b 10
# 12 -1.05138447 b 10
# 13 -0.80795342 b 10
# 14 1.74526091 b 10
# 15 2.07681466 b 10
# 16 -1.90573715 c 20
# 17 -0.72626244 c 20
# 18 -0.48017481 c 20
# 19 -0.42995920 c 20
# 20 0.17729002 c 20
# 21 -0.62947278 a 20
# 22 -0.40038152 a 20
# 23 -0.23368555 a 20
# 24 0.44218806 a 20
# 25 1.58561071 a 20
# 26 -0.66270426 b 20
# 27 -0.50256255 b 20
# 28 -0.19890974 b 20
# 29 0.26562533 b 20
# 30 1.84093124 b 20
# 31 -0.93702848 c 30
# 32 0.10804529 c 30
# 33 0.25758608 c 30
# 34 1.33084399 c 30
# 35 1.67204875 c 30
# 36 -1.88922564 a 30
# 37 -1.74551938 a 30
# 38 -1.32215854 a 30
# 39 -0.43743607 a 30
# 40 1.07554466 a 30
# 41 -0.38154167 b 30
# 42 0.53823057 b 30
# 43 0.83401316 b 30
# 44 1.04418363 b 30
# 45 2.45985490 b 30
Upvotes: 1