user338714
user338714

Reputation: 2375

rowwise look-up in R using data.table

I have a table of ranked factors and corresponding values by date, but I'd like to also present the data by rank instead of factor. In the first step, I determine the corresponding factor for each rank (.fact columns in output), and in the second step, I use the factor to determine the corresponding value for each rank (.val2 column in output). The actual dataset includes many more dates and factors (names and count vary). Each row corresponds to the model predicted rank for each factor on a specific date, and the value data is the realized value.

The code below works, but is there a more efficient (and faster) way to accomplish the row wise look-up operations below? Many of the data.table suggestions I've read discourage the use of .SD[, with=FALSE], but I've yet to find another solution.

library(data.table)
dt = data.table(Date = c("1/31/2013", "2/28/2013", "3/31/2013", 
                         "4/30/2013", "5/31/2013"), 
                A.rnk = c(5L, 2L, 2L, 3L, 3L), 
                B.rnk = c(4L, 3L, 1L, 2L, 5L), 
                C.rnk = c(3L, 1L, 4L, 1L, 1L), 
                D.rnk = c(2L, 4L, 3L, 5L, 2L),
                E.rnk = c(1L, 5L, 5L, 4L, 4L),
                A.val = rnorm(5), B.val = rnorm(5), 
                C.val = rnorm(5), D.val = rnorm(5),
                E.val = rnorm(5))
nms = c("A", "B", "C", "D", "E")
rnks = as.character(1:5)

# determine the factor (A,B,C) for each rank, by date
dt = dt[, c(rnks):={
  cols = .SD[, paste0(nms, ".rnk"), with=FALSE]
  cols = names(cols)[order(cols)]
  as.list(stringr::str_extract(cols, stringr::perl(".{1}(?=.rnk)")))
}, by=Date]

# determine the factor value (val) for each rank, by date
dt[, paste0(rnks, ".val2"):=
     .SD[, paste0(.SD[, rnks, with=FALSE], ".val"), with=FALSE], by=Date]

Upvotes: 2

Views: 296

Answers (1)

Arun
Arun

Reputation: 118779

Here's a complete rewrite. For your requirements, it's better to work with the data in long format, as shown below. It uses reshape2 package which has the functions melt and dcast to convert the data into long and wide format respectively.

Note that faster versions of melt and dcast are implemented (in C) in the current development version (1.8.11) of data.table. So, after the next release of data.table, you can use the same code, but you don't have to convert it back to a data.table (done using as.dat.table shown below) after the melt and dcast step + it'll be a lot faster.

Now on to the solution:

# loading packages
require(data.table)
require(reshape2)

# long format on just the .rnk columns
dt.m <- as.data.table(melt(dt, id="Date", measure=2:6))
setnames(dt.m, c("Date", "var1", "val1"))
dt.m[, c("var2", "val2") := as.data.table(melt(dt, id="Date", 
               measure=7:11))[, list(variable, value)]]
# sort by date column by reference
setkey(dt.m, Date)

# here you can alternatively use `order`, but `fastorder` is well, faster
oo <- data.table:::fastorder(as.list(dt.m)[c("Date","val1")])
dt.m[, val3 := rep(nms, length.out=length(oo))[oo]]
dt.m[, val4 := val2[val1], by=Date]

ans1 <- as.data.table(dcast(dt.m, Date ~ var1, value.var="val3"))[, Date := NULL]
ans2 <- as.data.table(dcast(dt.m, Date ~ var2, value.var="val4"))[, Date := NULL]
setnames(ans1, rnks)
setnames(ans2, paste(rnks, ".val2", sep=""))
cbind(dt, ans1, ans2)

Upvotes: 2

Related Questions