mbask
mbask

Reputation: 2481

Transform a set of columns in a data.table

A data.table novice question. I would like to transform a set of columns in a data.table by applying a mathematical formula to them. The set of columns must exclude 1 or more of the total number of columns.

In data.frame terms I would do:

data(iris)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

iris[, -5] <- iris[, -5] * 1e3
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1         5100        3500         1400         200  setosa
2         4900        3000         1400         200  setosa
3         4700        3200         1300         200  setosa
4         4600        3100         1500         200  setosa
5         5000        3600         1400         200  setosa
6         5400        3900         1700         400  setosa

I know how to select multiple columns in a data.table:

iris.dt <- data.table(iris)
head(iris.dt[, -5, with = FALSE])

or even:

head(iris.dt[, !"Species", with = FALSE])

How to actually transform those selected columns taking advantage of data.table pass-by-reference?

Upvotes: 12

Views: 4704

Answers (2)

Jonathan Rougier
Jonathan Rougier

Reputation: 61

.SDcols is the right approach, but you can specify the column names just once using a vector.

DT <- data.table(iris)
colnms <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
DT[, (colnms) := lapply(.SD, function(x) x*1000), .SDcols = colnms]

Note that you need the parentheses to the left of := to stop data.table interpreting colnms as the name of a column.

Upvotes: 6

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

What about using the .SDCols argument along with assignment by reference (:=):

DT <- data.table(iris)
DT[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
   :=lapply(.SD, function(x) x*1000), .SDcols=1:4]
# Alternatively you can grab the names the usual way:
# DT[, names(DT)[1:4] := lapply(.SD, function(x) x*1000), .SDcols=1:4]
DT
#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#   1:         5100        3500         1400         200    setosa
#   2:         4900        3000         1400         200    setosa
#   3:         4700        3200         1300         200    setosa
#   4:         4600        3100         1500         200    setosa
#   5:         5000        3600         1400         200    setosa
#  ---                                                            
# 146:         6700        3000         5200        2300 virginica
# 147:         6300        2500         5000        1900 virginica
# 148:         6500        3000         5200        2000 virginica
# 149:         6200        3400         5400        2300 virginica
# 150:         5900        3000         5100        1800 virginica

Upvotes: 14

Related Questions