Reputation: 3381
I have a data.frame
> variable_importance
Overall
x.1 87.30483
x.2 88.59212
x.3 34.16171
x.4 35.72880
x.5 50.62831
x.6 44.76673
x.7 31.12285
x.8 43.04628
x.9 33.01750
x.10 30.72718
I would like to order the data frame by the Overall
variable, but such that the x.?
identifiers remain with their respective values.
I.e. it should end up as
x.2 88.59212
x.1 87.30483
x.5 50.62831
[...]
order
just gives me the indeces of the rearranged data frame and I loose the row identifiers.
How can I do this and is there a solution using the data.table
library?
Upvotes: 1
Views: 353
Reputation: 118779
From version 1.9.5 of data.table
(currently devel), you can also use setorder()
on a data.frame
. It reorders the input object by reference.
require(data.table)
setorder(df, -Overall)
df
# Overall
# x.2 88.59212
# x.1 87.30483
# x.5 50.62831
# x.6 44.76673
# x.8 43.04628
# x.4 35.72880
# x.3 34.16171
# x.9 33.01750
# x.7 31.12285
# x.10 30.72718
Check this answer for benchmarks on how setorder()
is both fast and memory efficient.
Upvotes: 0
Reputation:
you can use apply to sort your data by the mentioned column
data<- structure(list(V1 = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 2L), .Label = c("x.1", "x.10", "x.2", "x.3", "x.4", "x.5",
"x.6", "x.7", "x.8", "x.9"), class = "factor"), V2 = c(87.30483,
88.59212, 34.16171, 35.7288, 50.62831, 44.76673, 31.12285, 43.04628,
33.0175, 30.72718)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-10L))
apply(data, 2, sort)
Upvotes: 0
Reputation: 201
An data.table example:
require("data.table")
variable_importance <- data.frame(Overall=c(87.30483,88.59212,34.16171,35.72880,50.62831,44.76673,31.12285,43.04628,33.01750,30.72718), row.names=paste0("x.",1:10))
variable_importance # show data.frame
dt <- as.data.table(variable_importance, keep.rownames=T) # new data.table, by value (copy)
#dt <- setDT(variable_importance, keep.rownames=T) # new data.table, by reference (so variable_importance is now the same data.table, too)
setorder(dt, -Overall) # order data.table reverse by column Overall
setnames(dt, "rn", "") # delete colname "rn"
dt # show data.table
setDT
promotes variable_importance
, which is much faster on huge data sets.
When you transform the data.frame to a data.table you have to specify keep.rownames=T
and you get a new column called rn
with the original rownames
, as data.table automaticly numbers the rows.
Normly, when workign with data.table
, you should not asign empty column names as you work with them. It is better practice to make a new column called id
.
setnames(dt, "", "rn") # give column back it's name to work with it
dt[,id:=as.integer(substr(rn, start=3, stop=nchar(rn)))] # extract numbers from rownames
dt[,rn:=NULL] # delete column rn
setcolorder(dt, c("id","Overall")) # reorder columns
dt # show data.table
Upvotes: 0
Reputation: 3470
Use order
to index into variable.importance
but also use drop = FALSE
to avoid coercing the data frame to a vector and losing the row names:
> variable.importance[order(-variable.importance),, drop = FALSE]
Overall
x.2 88.59212
x.1 87.30483
x.5 50.62831
x.6 44.76673
x.8 43.04628
x.4 35.72880
x.3 34.16171
x.9 33.01750
x.7 31.12285
x.10 30.72718
Upvotes: 1