replacing data.frame element-wise operations with data.table (that used rowname)

Question

So lets say I have the following data.frames:

df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10])
df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10),
    row.names = letters[1:10])

And perhaps the "equivalent" data.tables:

dt1 <- data.table(x = rownames(df1), df1, key = 'x')
dt2 <- data.table(x = rownames(df2), df2, key = 'x')

If I want to do element-wise operations between df1 and df2, they look something like

dfRes <- df1 / df2

And rownames() is preserved:

R> head(dfRes)
    y          z
a 0.5  3.1405463
b 1.0  1.2925200
c 1.5  1.4137930
d 2.0 -0.5532855
e 2.5 -0.0998303
f 1.2 -1.6236294

My poor understanding of data.table says the same operation should look like this:

dtRes <- dt1[, !'x', with = F] / dt2[, !'x', with = F]
dtRes[, x := dt1[,x,]]
setkey(dtRes, x)

(setkey optional)

Is there a more data.table-esque way of doing this?

As a slightly related aside, more generally, I would have other columns such as factors in each data.table and I would like to omit those columns while doing the element-wise operations, but still have them in the result. Does this make sense?

Thanks!

Arun · Accepted Answer

IMO the proper way to do this would be with a join - it'll take care of matching the column with row names correctly.

I'll illustrate using data.table v1.9.3. You can find the install instructions on the github project page.

## 1.9.3 
dt1[dt2, list(x, y=y/i.y, z=z/i.z)]
#      x   y           z
#  1: a 0.5  6.20339701
#  2: b 1.0  1.72701257
#  3: c 1.5  0.11444594
#  4: d 2.0 -0.70715087
#  5: e 2.5 -0.41692176
#  6: f 1.2  0.07033400
#  7: g 1.4  0.45198379
#  8: h 1.6 -0.04762567
#  9: i 1.8 -1.46270143
# 10: j 2.0 -0.92588495

i.y and i.z refer to the data.table dt2's columns respectively, during a join.

If you've many more columns, you can just construct an expression and evaluate it. You can find many such posts here on SO under the [r] [data.table] tag.

If you'd like to stick to the CRAN version (1.9.2), then you can do:

## 1.9.2
dt1[dt2, list(y=y/i.y, z=z/i.z)]

You don't need the x, as it by default returns the key columns.

For those interested in the difference between the two versions:

In versions < 1.9.3 of data.table, a join of the form, x[i, list(...)] - that is, where j is provided - here list(...), implicitly performed a by-without-by operation. That is, it computed j for each value of i's key columns that matched with x. While this is a great feature, there was no way to opt out of it. As a result operations where by-without-by was not necessary were slightly slower..

Therefore, in v1.9.3+ versions, we've replaced with implicit-by (or by-without-by) feature to require by explicitly, as by = .EACHI.

That is, in v1.9.3+: x[i, list(...)] will first compute i and then j, not for each i.

And, x[i, list(...), by=.EACHI] will compute j for each i - which is equivalent to the way joins were performed in versions < 1.9.3.

Hope this helps understand the difference.

replacing data.frame element-wise operations with data.table (that used rowname)

Answers (1)

Related Questions