Reputation: 2375
I have a table of ranked factors and corresponding values by date, but I'd like to also present the data by rank instead of factor. In the first step, I determine the corresponding factor for each rank (.fact columns in output), and in the second step, I use the factor to determine the corresponding value for each rank (.val2 column in output). The actual dataset includes many more dates and factors (names and count vary). Each row corresponds to the model predicted rank for each factor on a specific date, and the value data is the realized value.
The code below works, but is there a more efficient (and faster) way to accomplish the row wise look-up operations below? Many of the data.table
suggestions I've read discourage the use of .SD[, with=FALSE]
, but I've yet to find another solution.
library(data.table)
dt = data.table(Date = c("1/31/2013", "2/28/2013", "3/31/2013",
"4/30/2013", "5/31/2013"),
A.rnk = c(5L, 2L, 2L, 3L, 3L),
B.rnk = c(4L, 3L, 1L, 2L, 5L),
C.rnk = c(3L, 1L, 4L, 1L, 1L),
D.rnk = c(2L, 4L, 3L, 5L, 2L),
E.rnk = c(1L, 5L, 5L, 4L, 4L),
A.val = rnorm(5), B.val = rnorm(5),
C.val = rnorm(5), D.val = rnorm(5),
E.val = rnorm(5))
nms = c("A", "B", "C", "D", "E")
rnks = as.character(1:5)
# determine the factor (A,B,C) for each rank, by date
dt = dt[, c(rnks):={
cols = .SD[, paste0(nms, ".rnk"), with=FALSE]
cols = names(cols)[order(cols)]
as.list(stringr::str_extract(cols, stringr::perl(".{1}(?=.rnk)")))
}, by=Date]
# determine the factor value (val) for each rank, by date
dt[, paste0(rnks, ".val2"):=
.SD[, paste0(.SD[, rnks, with=FALSE], ".val"), with=FALSE], by=Date]
Upvotes: 2
Views: 296
Reputation: 118779
Here's a complete rewrite. For your requirements, it's better to work with the data in long format, as shown below. It uses reshape2
package which has the functions melt
and dcast
to convert the data into long and wide format respectively.
Note that faster versions of
melt
anddcast
are implemented (in C) in the current development version (1.8.11) ofdata.table
. So, after the next release ofdata.table
, you can use the same code, but you don't have to convert it back to adata.table
(done usingas.dat.table
shown below) after themelt
anddcast
step+
it'll be a lot faster.
Now on to the solution:
# loading packages
require(data.table)
require(reshape2)
# long format on just the .rnk columns
dt.m <- as.data.table(melt(dt, id="Date", measure=2:6))
setnames(dt.m, c("Date", "var1", "val1"))
dt.m[, c("var2", "val2") := as.data.table(melt(dt, id="Date",
measure=7:11))[, list(variable, value)]]
# sort by date column by reference
setkey(dt.m, Date)
# here you can alternatively use `order`, but `fastorder` is well, faster
oo <- data.table:::fastorder(as.list(dt.m)[c("Date","val1")])
dt.m[, val3 := rep(nms, length.out=length(oo))[oo]]
dt.m[, val4 := val2[val1], by=Date]
ans1 <- as.data.table(dcast(dt.m, Date ~ var1, value.var="val3"))[, Date := NULL]
ans2 <- as.data.table(dcast(dt.m, Date ~ var2, value.var="val4"))[, Date := NULL]
setnames(ans1, rnks)
setnames(ans2, paste(rnks, ".val2", sep=""))
cbind(dt, ans1, ans2)
Upvotes: 2