Reputation: 10399
I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:
column name: date / mcode / mname / ycode / yname / yissue / bsent / breturn / tsent / treturn / csales
type: Date / Char / Char / Char / Char / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric
I want to sort the data by those variables in the following order:
The order of levels are important here, i.e. they should be sorted by date first, and if there are identical dates, they should be sorted by mcode, so and so forth. How can I do that in R?
Upvotes: 5
Views: 1456
Reputation: 11
Additional notes: use -c() to reverse sort factor or character columns
with(df, df[order(a, b, -c(myCharCol)), ])
Also you can add a vector to pick only certain columns
with(df, df[order(a, b, c), c('a','b','x','y')])
Upvotes: 1
Reputation: 60756
if none of the above answers light your fire you can always use the orderBy() function from the doBy package:
require(doBy)
sortedData <- orderBy(~date+mcode+ycode+yissue , data=unsortedData)
As you might intuitively expect, you can put a negative sign in front of any variable to sort it descending.
There's nothing magical about orderBy(). As the documentation states, it is a "wrapper for the order() function - the important difference being that variables to order by can be given by a model formula."
I find the syntax easier to remember.
Upvotes: 4
Reputation: 894
Perhaps something like this?
> df<- data.frame(a=rev(1:10), b=rep(c(2,1),5), c=rnorm(10))
> df
a b c
1 10 2 -0.85212079
2 9 1 -0.46199463
3 8 2 -1.52374565
4 7 1 0.28904717
5 6 2 -0.91609012
6 5 1 1.60448783
7 4 2 0.51249796
8 3 1 -1.35119089
9 2 2 -0.55497745
10 1 1 -0.05723538
> with(df, df[order(a, b, c), ])
a b c
10 1 1 -0.05723538
9 2 2 -0.55497745
8 3 1 -1.35119089
7 4 2 0.51249796
6 5 1 1.60448783
5 6 2 -0.91609012
4 7 1 0.28904717
3 8 2 -1.52374565
2 9 1 -0.46199463
1 10 2 -0.85212079
The "order" function can take several vectors as arguments.
Upvotes: 11
Reputation: 55735
building on the earlier solution, here are two other approaches. the second approach requires plyr.
df.sorted = df[do.call(order, df[names(df)]),];
df.sorted = arrange(df, a, b, c)
Upvotes: 8