TMOTTM
TMOTTM

Reputation: 3381

Order a data.table

I have a data.frame

> variable_importance
      Overall
x.1  87.30483
x.2  88.59212
x.3  34.16171
x.4  35.72880
x.5  50.62831
x.6  44.76673
x.7  31.12285
x.8  43.04628
x.9  33.01750
x.10 30.72718

I would like to order the data frame by the Overall variable, but such that the x.? identifiers remain with their respective values.

I.e. it should end up as

x.2  88.59212
x.1  87.30483
x.5  50.62831
[...]

order just gives me the indeces of the rearranged data frame and I loose the row identifiers.

How can I do this and is there a solution using the data.table library?

Upvotes: 1

Views: 353

Answers (4)

Arun
Arun

Reputation: 118779

From version 1.9.5 of data.table (currently devel), you can also use setorder() on a data.frame. It reorders the input object by reference.

require(data.table)
setorder(df, -Overall)
df
#       Overall
# x.2  88.59212
# x.1  87.30483
# x.5  50.62831
# x.6  44.76673
# x.8  43.04628
# x.4  35.72880
# x.3  34.16171
# x.9  33.01750
# x.7  31.12285
# x.10 30.72718

Check this answer for benchmarks on how setorder() is both fast and memory efficient.

Upvotes: 0

user1267127
user1267127

Reputation:

you can use apply to sort your data by the mentioned column

data<- structure(list(V1 = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 2L), .Label = c("x.1", "x.10", "x.2", "x.3", "x.4", "x.5", 
"x.6", "x.7", "x.8", "x.9"), class = "factor"), V2 = c(87.30483, 
88.59212, 34.16171, 35.7288, 50.62831, 44.76673, 31.12285, 43.04628, 
33.0175, 30.72718)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-10L))

apply(data, 2, sort)

Upvotes: 0

Marco Breitig
Marco Breitig

Reputation: 201

An data.table example:

require("data.table")
variable_importance <- data.frame(Overall=c(87.30483,88.59212,34.16171,35.72880,50.62831,44.76673,31.12285,43.04628,33.01750,30.72718), row.names=paste0("x.",1:10))
variable_importance # show data.frame
dt <- as.data.table(variable_importance, keep.rownames=T) # new data.table, by value (copy)
#dt <- setDT(variable_importance, keep.rownames=T)  # new data.table, by reference (so variable_importance is now the same data.table, too)
setorder(dt, -Overall)  # order data.table reverse by column Overall
setnames(dt, "rn", "")  # delete colname "rn"
dt # show data.table

setDT promotes variable_importance, which is much faster on huge data sets. When you transform the data.frame to a data.table you have to specify keep.rownames=T and you get a new column called rn with the original rownames, as data.table automaticly numbers the rows. Normly, when workign with data.table, you should not asign empty column names as you work with them. It is better practice to make a new column called id.

setnames(dt, "", "rn")  # give column back it's name to work with it
dt[,id:=as.integer(substr(rn, start=3, stop=nchar(rn)))]  # extract numbers from rownames
dt[,rn:=NULL] # delete column rn
setcolorder(dt, c("id","Overall"))  # reorder columns
dt # show data.table

Upvotes: 0

Kodiologist
Kodiologist

Reputation: 3470

Use order to index into variable.importance but also use drop = FALSE to avoid coercing the data frame to a vector and losing the row names:

> variable.importance[order(-variable.importance),, drop = FALSE]
      Overall
x.2  88.59212
x.1  87.30483
x.5  50.62831
x.6  44.76673
x.8  43.04628
x.4  35.72880
x.3  34.16171
x.9  33.01750
x.7  31.12285
x.10 30.72718

Upvotes: 1

Related Questions