Mehper C. Palavuzlar
Mehper C. Palavuzlar

Reputation: 10399

Sorting data in R

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue  / bsent   / breturn / tsent   / treturn / csales
type:        Date / Char  / Char  / Char  / Char  / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric

I want to sort the data by those variables in the following order:

  1. date
  2. mcode
  3. ycode
  4. yissue

The order of levels are important here, i.e. they should be sorted by date first, and if there are identical dates, they should be sorted by mcode, so and so forth. How can I do that in R?

Upvotes: 5

Views: 1456

Answers (4)

Tomsim
Tomsim

Reputation: 11

Additional notes: use -c() to reverse sort factor or character columns

with(df, df[order(a, b, -c(myCharCol)), ])

Also you can add a vector to pick only certain columns

with(df, df[order(a, b, c), c('a','b','x','y')])

Upvotes: 1

JD Long
JD Long

Reputation: 60756

if none of the above answers light your fire you can always use the orderBy() function from the doBy package:

require(doBy)
sortedData <- orderBy(~date+mcode+ycode+yissue , data=unsortedData)

As you might intuitively expect, you can put a negative sign in front of any variable to sort it descending.

There's nothing magical about orderBy(). As the documentation states, it is a "wrapper for the order() function - the important difference being that variables to order by can be given by a model formula."

I find the syntax easier to remember.

Upvotes: 4

jbremnant
jbremnant

Reputation: 894

Perhaps something like this?

> df<- data.frame(a=rev(1:10), b=rep(c(2,1),5), c=rnorm(10))
> df
    a b           c
1  10 2 -0.85212079
2   9 1 -0.46199463
3   8 2 -1.52374565
4   7 1  0.28904717
5   6 2 -0.91609012
6   5 1  1.60448783
7   4 2  0.51249796
8   3 1 -1.35119089
9   2 2 -0.55497745
10  1 1 -0.05723538
> with(df, df[order(a, b, c), ])
    a b           c
10  1 1 -0.05723538
9   2 2 -0.55497745
8   3 1 -1.35119089
7   4 2  0.51249796
6   5 1  1.60448783
5   6 2 -0.91609012
4   7 1  0.28904717
3   8 2 -1.52374565
2   9 1 -0.46199463
1  10 2 -0.85212079

The "order" function can take several vectors as arguments.

Upvotes: 11

Ramnath
Ramnath

Reputation: 55735

building on the earlier solution, here are two other approaches. the second approach requires plyr.

df.sorted = df[do.call(order, df[names(df)]),];
df.sorted = arrange(df, a, b, c) 

Upvotes: 8

Related Questions