Reputation: 609
I have a data-table with pairs of columns:
set.seed(1)
dt <- data.table(a1 = round(runif(3),1), a2 = round(runif(3),1), a3 =round(runif(3),1),
b1 = round(runif(3),1), b2 = round(runif(3),1), b3 =round(runif(3),1))
it looks like this:
a1 a2 a3 b1 b2 b3
0.3 0.9 0.9 0.1 0.7 0.5
0.4 0.2 0.7 0.2 0.4 0.7
0.6 0.9 0.6 0.2 0.8 1.0
Now I would like to define a subset of columns:
n<-2
cols <- paste0(rep(letters[1:2], n),rep(1:n,each=2))
"a1" "b1" "a2" "b2"
and add a new column for each pair with the difference of the pair:
dt[,.(desired_ab1=c(0.2,0.2,0.4),desired_ab2=c(0.2,-0.2,0.1))]
desired_ab1 desired_ab2
0.2 0.2
0.2 -0.2
0.4 0.1
my goal looks similar to a previous post: Sum pairs of columns by group, so I tried the following:
myNames <- paste0("ab_", seq(n))
dt[, (myNames):=lapply(1:(ncol(.SD)/2),
function(x) (.SD[2*(x-1)+1]-.SD[2*(x-1)+2])), .SDcols=cols][]
I don't see what happens. Can someone give me a hint where this fails?
Upvotes: 0
Views: 1215
Reputation: 42544
The arithmetic to access the columns looks rather complicated and is fragile if the columns are ordered differently.
By reshaping the data from wide to long form the calculation can be simplified. The data in long form is the result of melting two measure columns simultaneously
cols <- c("a", "b")
melt(dt, measure.vars = patterns(cols), value.name = cols)]
variable a b 1: 1 0.3 0.1 2: 1 0.4 0.2 3: 1 0.6 0.2 4: 2 0.9 0.7 5: 2 0.2 0.4 6: 2 0.9 0.8 7: 3 0.9 0.5 8: 3 0.7 0.7 9: 3 0.6 1.0
Now, only the difference between columns a
and b
needs to be calculated. Finally, the result can be reshaped to wide form again (after filtering for the desired pairs.
If all parts are put together this becomes
# reshape from wide to long format with two measure variables
melt(dt, measure.vars = patterns(cols), value.name = cols)[
# calculate differences
, ab := a - b][
# select pairs of interest
variable %in% 1:2,
# reshape from long to wide format
dcast(.SD, rowid(variable) ~ paste0("ab", variable), value.var = "ab")][
# drop row id
, -"variable"]
ab1 ab2 1: 0.2 0.2 2: 0.2 -0.2 3: 0.4 0.1
Upvotes: 2
Reputation: 17289
.SD
is also a data.table
. You have to use [[
to subset a column (or use [, j, with = FALSE]
). So this: function(x) (.SD[[2*(x-1)+1]]-.SD[[2*(x-1)+2]])
will work for you case. And when you use single [
, it is actually subsetting a row not a column from .SD
(like what happens for a normal data.table
).
Upvotes: 2