Reputation: 313
So lets say I have the following data.frame
s:
df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10])
df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10),
row.names = letters[1:10])
And perhaps the "equivalent" data.table
s:
dt1 <- data.table(x = rownames(df1), df1, key = 'x')
dt2 <- data.table(x = rownames(df2), df2, key = 'x')
If I want to do element-wise operations between df1
and df2
, they look something like
dfRes <- df1 / df2
And rownames()
is preserved:
R> head(dfRes)
y z
a 0.5 3.1405463
b 1.0 1.2925200
c 1.5 1.4137930
d 2.0 -0.5532855
e 2.5 -0.0998303
f 1.2 -1.6236294
My poor understanding of data.table
says the same operation should look like this:
dtRes <- dt1[, !'x', with = F] / dt2[, !'x', with = F]
dtRes[, x := dt1[,x,]]
setkey(dtRes, x)
(setkey
optional)
Is there a more data.table
-esque way of doing this?
As a slightly related aside, more generally, I would have other columns such as factors in each data.table and I would like to omit those columns while doing the element-wise operations, but still have them in the result. Does this make sense?
Thanks!
Upvotes: 1
Views: 545
Reputation: 118789
IMO the proper way to do this would be with a join - it'll take care of matching the column with row names correctly.
I'll illustrate using data.table v1.9.3
. You can find the install instructions on the github project page.
## 1.9.3
dt1[dt2, list(x, y=y/i.y, z=z/i.z)]
# x y z
# 1: a 0.5 6.20339701
# 2: b 1.0 1.72701257
# 3: c 1.5 0.11444594
# 4: d 2.0 -0.70715087
# 5: e 2.5 -0.41692176
# 6: f 1.2 0.07033400
# 7: g 1.4 0.45198379
# 8: h 1.6 -0.04762567
# 9: i 1.8 -1.46270143
# 10: j 2.0 -0.92588495
i.y
and i.z
refer to the data.table dt2
's columns respectively, during a join.
If you've many more columns, you can just construct an expression and evaluate it. You can find many such posts here on SO under the
[r] [data.table]
tag.
If you'd like to stick to the CRAN version (1.9.2), then you can do:
## 1.9.2
dt1[dt2, list(y=y/i.y, z=z/i.z)]
You don't need the x
, as it by default returns the key columns.
For those interested in the difference between the two versions:
In versions < 1.9.3 of
data.table
, a join of the form,x[i, list(...)]
- that is, wherej
is provided - herelist(...)
, implicitly performed a by-without-by operation. That is, it computedj
for each value ofi
's key columns that matched withx
. While this is a great feature, there was no way to opt out of it. As a result operations where by-without-by was not necessary were slightly slower..Therefore, in v1.9.3+ versions, we've replaced with implicit-by (or by-without-by) feature to require by explicitly, as
by = .EACHI
.That is, in v1.9.3+:
x[i, list(...)]
will first computei
and thenj
, not for eachi
.And,
x[i, list(...), by=.EACHI]
will computej
for eachi
- which is equivalent to the way joins were performed in versions < 1.9.3.
Hope this helps understand the difference.
Upvotes: 4