Mahmoud
Mahmoud

Reputation: 401

Sorting Data.Table Based on Multiple Columns

Consider a data.table below:

DT <- data.table(a=c(1,2,4,3,5), b=c(3:5,NA,2), c=c(2,1,NA,NA,3)) 
DT
   a  b  c
1: 1  3  2
2: 2  4  1
3: 4  5 NA
4: 3 NA NA
5: 5  2  3

I want to sort the rows based on 3rd column and then 1st column. I can do it using:

DT[order(DT[,3],DT[,1])]

   a  b  c
1: 2  4  1
2: 1  3  2
3: 5  2  3
4: 3 NA NA
5: 4  5 NA

But, if DT has many columns and lets say I want to sort them based on 1st to i-th columns, then it won't be that efficient to write it as:

DT[order(DT[,1], DT[,2], DT[,3], ... DT[,i])]

Instead, I'd like to provide the column indices as a vector (see below):

DT[order(DT[,c(1:i)])]

But, it doesn't work the way I expect and the output is:

DT[order(DT[,c(3,1)])]

     a  b  c
 1:  2  4  1
 2: NA NA NA
 3:  1  3  2
 4: NA NA NA
 5:  5  2  3
 6: NA NA NA
 7: NA NA NA
 8: NA NA NA
 9:  4  5 NA
10:  3 NA NA

Any advise on how I can fix that? Thanks!

Upvotes: 6

Views: 2876

Answers (1)

akrun
akrun

Reputation: 886938

We can use do.call with order after specifying the .SDcols

DT[DT[,do.call(order, .SD), .SDcols = c(3, 1)]]
#   a  b  c
#1: 2  4  1
#2: 1  3  2
#3: 5  2  3
#4: 3 NA NA
#5: 4  5 NA

Upvotes: 7

Related Questions