Reputation: 21
I'm trying to sort a data frame in R across multiple columns to allow for easier usage later. The problem is that I have no way of knowing how many columns there will be exactly. For test purposes, I'm taking the following sample:
V1 V2 V3 V4 V5
1 -0.3798680 0 0 -1 -1
2 -0.1782780 0 0 -1 -1
3 0.9862250 -2 -1 0 0
4 0.6831790 -2 -1 -1 1
5 -0.5814570 0 -1 -1 -1
6 -0.3909930 0 1 -1 0
7 0.1629140 -1 -2 -1 0
8 -0.3417220 0 0 0 -1
9 -0.3613250 0 0 -1 0
10 -0.2879470 0 -1 -1 0
11 0.2958940 -1 -1 0 0
12 0.3984110 -2 -1 1 0
13 -0.7388080 1 1 -1 0
14 -0.4037090 0 0 0 -1
15 0.5192050 -2 -1 1 1
16 0.0474172 -1 -1 -1 1
17 -0.6458280 0 0 -1 0
18 -0.4018540 0 0 0 -1
19 -0.3748340 0 0 0 0
20 -0.2182780 -1 0 0 1
and apply the following:
test.data.sorted.1 <- test.data[order(test.data[,2], test.data[,3], test.data[,4], test.data[,5]),]
and get
V1 V2 V3 V4 V5
4 0.6831790 -2 -1 -1 1
3 0.9862250 -2 -1 0 0
12 0.3984110 -2 -1 1 0
15 0.5192050 -2 -1 1 1
7 0.1629140 -1 -2 -1 0
16 0.0474172 -1 -1 -1 1
11 0.2958940 -1 -1 0 0
20 -0.2182780 -1 0 0 1
5 -0.5814570 0 -1 -1 -1
10 -0.2879470 0 -1 -1 0
1 -0.3798680 0 0 -1 -1
2 -0.1782780 0 0 -1 -1
9 -0.3613250 0 0 -1 0
17 -0.6458280 0 0 -1 0
8 -0.3417220 0 0 0 -1
14 -0.4037090 0 0 0 -1
18 -0.4018540 0 0 0 -1
19 -0.3748340 0 0 0 0
6 -0.3909930 0 1 -1 0
13 -0.7388080 1 1 -1 0
This produces the case that I want (that being that when the data frame is getting sorted by one column, any previously sorted columns must still be sorted at the end of it), but the way this is written, it's obviously cluttered and inflexible in regards to the number of columns the data frame I'm feeding it might have. So let's say the variable "columns" denotes the number of columns in that list. If I try, say,
test.data.sorted.2 <- test.data[order(test.data[,2:columns]),]
it gives me
V1 V2 V3 V4 V5
3 0.9862250 -2 -1 0 0
4 0.6831790 -2 -1 -1 1
12 0.3984110 -2 -1 1 0
15 0.5192050 -2 -1 1 1
7 0.1629140 -1 -2 -1 0
11 0.2958940 -1 -1 0 0
16 0.0474172 -1 -1 -1 1
20 -0.2182780 -1 0 0 1
1 -0.3798680 0 0 -1 -1
2 -0.1782780 0 0 -1 -1
5 -0.5814570 0 -1 -1 -1
6 -0.3909930 0 1 -1 0
8 -0.3417220 0 0 0 -1
9 -0.3613250 0 0 -1 0
10 -0.2879470 0 -1 -1 0
14 -0.4037090 0 0 0 -1
17 -0.6458280 0 0 -1 0
18 -0.4018540 0 0 0 -1
19 -0.3748340 0 0 0 0
13 -0.7388080 1 1 -1 0
which only seems to have sorted the second column. Similarily, running a for loop like
for (i in 2:columns){
test.data.sorted.3 <- test.data[order(test.data[,i]),]
}
I get the following:
V1 V2 V3 V4 V5
1 -0.3798680 0 0 -1 -1
2 -0.1782780 0 0 -1 -1
5 -0.5814570 0 -1 -1 -1
8 -0.3417220 0 0 0 -1
14 -0.4037090 0 0 0 -1
18 -0.4018540 0 0 0 -1
3 0.9862250 -2 -1 0 0
6 -0.3909930 0 1 -1 0
7 0.1629140 -1 -2 -1 0
9 -0.3613250 0 0 -1 0
10 -0.2879470 0 -1 -1 0
11 0.2958940 -1 -1 0 0
12 0.3984110 -2 -1 1 0
13 -0.7388080 1 1 -1 0
17 -0.6458280 0 0 -1 0
19 -0.3748340 0 0 0 0
4 0.6831790 -2 -1 -1 1
15 0.5192050 -2 -1 1 1
16 0.0474172 -1 -1 -1 1
20 -0.2182780 -1 0 0 1
which isn't what I'm looking for, either. The question is: how do I achieve the same result as the first example while still keeping the number of columns that I might have to go through flexible?
Upvotes: 0
Views: 48
Reputation: 6770
The first thing you have is equivalent to
iris[order(iris[,2], iris[,3], iris[,4]),]
This
iris[order(iris[,2:4]),]
is what you tried and it doesn't work as expected. So it definitely won't work for an unknown number of columns. (It doesn't work because you are supplying a data frame not a comma separated set of vectors.)
If you read ?order it is a bit confusing but you need a comma separated list of vectors to sort by. The help file suggests using do.call.
iris[do.call(order, c(iris[2:length(iris)])),]
Should do the trick
To me the help was confusing in a way a lot of the old help files are in that they don't explain much.
Upvotes: 1