PDotAlex
PDotAlex

Reputation: 21

Sorting across a variable number of columns in R

I'm trying to sort a data frame in R across multiple columns to allow for easier usage later. The problem is that I have no way of knowing how many columns there will be exactly. For test purposes, I'm taking the following sample:

           V1 V2 V3 V4 V5
1  -0.3798680  0  0 -1 -1
2  -0.1782780  0  0 -1 -1
3   0.9862250 -2 -1  0  0
4   0.6831790 -2 -1 -1  1    
5  -0.5814570  0 -1 -1 -1
6  -0.3909930  0  1 -1  0
7   0.1629140 -1 -2 -1  0
8  -0.3417220  0  0  0 -1
9  -0.3613250  0  0 -1  0
10 -0.2879470  0 -1 -1  0
11  0.2958940 -1 -1  0  0
12  0.3984110 -2 -1  1  0
13 -0.7388080  1  1 -1  0
14 -0.4037090  0  0  0 -1
15  0.5192050 -2 -1  1  1
16  0.0474172 -1 -1 -1  1
17 -0.6458280  0  0 -1  0
18 -0.4018540  0  0  0 -1
19 -0.3748340  0  0  0  0
20 -0.2182780 -1  0  0  1

and apply the following:

test.data.sorted.1 <- test.data[order(test.data[,2], test.data[,3], test.data[,4], test.data[,5]),]

and get

           V1 V2 V3 V4 V5
4   0.6831790 -2 -1 -1  1
3   0.9862250 -2 -1  0  0
12  0.3984110 -2 -1  1  0
15  0.5192050 -2 -1  1  1
7   0.1629140 -1 -2 -1  0
16  0.0474172 -1 -1 -1  1
11  0.2958940 -1 -1  0  0
20 -0.2182780 -1  0  0  1
5  -0.5814570  0 -1 -1 -1
10 -0.2879470  0 -1 -1  0
1  -0.3798680  0  0 -1 -1
2  -0.1782780  0  0 -1 -1
9  -0.3613250  0  0 -1  0
17 -0.6458280  0  0 -1  0
8  -0.3417220  0  0  0 -1
14 -0.4037090  0  0  0 -1
18 -0.4018540  0  0  0 -1
19 -0.3748340  0  0  0  0
6  -0.3909930  0  1 -1  0
13 -0.7388080  1  1 -1  0

This produces the case that I want (that being that when the data frame is getting sorted by one column, any previously sorted columns must still be sorted at the end of it), but the way this is written, it's obviously cluttered and inflexible in regards to the number of columns the data frame I'm feeding it might have. So let's say the variable "columns" denotes the number of columns in that list. If I try, say,

test.data.sorted.2 <- test.data[order(test.data[,2:columns]),]

it gives me

           V1 V2 V3 V4 V5
3   0.9862250 -2 -1  0  0
4   0.6831790 -2 -1 -1  1
12  0.3984110 -2 -1  1  0
15  0.5192050 -2 -1  1  1
7   0.1629140 -1 -2 -1  0
11  0.2958940 -1 -1  0  0
16  0.0474172 -1 -1 -1  1
20 -0.2182780 -1  0  0  1
1  -0.3798680  0  0 -1 -1
2  -0.1782780  0  0 -1 -1
5  -0.5814570  0 -1 -1 -1
6  -0.3909930  0  1 -1  0
8  -0.3417220  0  0  0 -1
9  -0.3613250  0  0 -1  0
10 -0.2879470  0 -1 -1  0
14 -0.4037090  0  0  0 -1
17 -0.6458280  0  0 -1  0
18 -0.4018540  0  0  0 -1
19 -0.3748340  0  0  0  0
13 -0.7388080  1  1 -1  0

which only seems to have sorted the second column. Similarily, running a for loop like

for (i in 2:columns){
  test.data.sorted.3 <- test.data[order(test.data[,i]),]
}

I get the following:

           V1 V2 V3 V4 V5
1  -0.3798680  0  0 -1 -1
2  -0.1782780  0  0 -1 -1
5  -0.5814570  0 -1 -1 -1
8  -0.3417220  0  0  0 -1
14 -0.4037090  0  0  0 -1
18 -0.4018540  0  0  0 -1
3   0.9862250 -2 -1  0  0
6  -0.3909930  0  1 -1  0
7   0.1629140 -1 -2 -1  0
9  -0.3613250  0  0 -1  0
10 -0.2879470  0 -1 -1  0
11  0.2958940 -1 -1  0  0
12  0.3984110 -2 -1  1  0
13 -0.7388080  1  1 -1  0
17 -0.6458280  0  0 -1  0
19 -0.3748340  0  0  0  0
4   0.6831790 -2 -1 -1  1
15  0.5192050 -2 -1  1  1
16  0.0474172 -1 -1 -1  1
20 -0.2182780 -1  0  0  1

which isn't what I'm looking for, either. The question is: how do I achieve the same result as the first example while still keeping the number of columns that I might have to go through flexible?

Upvotes: 0

Views: 48

Answers (1)

Elin
Elin

Reputation: 6770

The first thing you have is equivalent to

 iris[order(iris[,2], iris[,3], iris[,4]),]

This

 iris[order(iris[,2:4]),]

is what you tried and it doesn't work as expected. So it definitely won't work for an unknown number of columns. (It doesn't work because you are supplying a data frame not a comma separated set of vectors.)

If you read ?order it is a bit confusing but you need a comma separated list of vectors to sort by. The help file suggests using do.call.

 iris[do.call(order, c(iris[2:length(iris)])),]

Should do the trick

To me the help was confusing in a way a lot of the old help files are in that they don't explain much.

Upvotes: 1

Related Questions