Reputation: 185
I am using data.table I have a vector of the names for the new columns. I want to create these new columns but using the information from the old columns. Let me show it on a following example:
data <- data.table(a = c("OneA", "TwoB", "ThreeC"),
b = c(1, 2, 3))
newCols <- c("One", "Two", "Three")
for (newCol in newCols) {
data[, eval(newCol) := gsub(paste0("^.*", newCol), "", a)]
}
As an output I expect (and get) the following:
> data
a b One Two Three
1: OneA 1 A OneA OneA
2: TwoB 2 TwoB B TwoB
3: ThreeC 3 ThreeC ThreeC C
In this case I combine defining new columns from the vector and using the vector values themselves to fill these columns. Is there a way to do it more optimal (for example with a set())?
Upvotes: 4
Views: 127
Reputation: 83275
Actually, you where quite close. Just adapt your code to:
for (newCol in newCols) {
DT[, (newCol) := sub(newCol, "", a)]
}
and you have a solution that is both fast and memory efficient (better than using lapply
).
Alternatively you can also use set
inside the for-loop:
for (newCol in newCols) {
set(DT, j = newCol, value = sub(newCol, "", DT[["a"]]))
}
So why is a for
-loop a better choice in this situation?
lapply
the RHS (right hand sight) of :=
is evaluated first. This means that all the new columns are created first and have to be allocated and populated in memory first and are then added to the data.table
. The for
-loop approach is more efficient because it does one column at a time and thus only needs working memory for that one column.set
is a low-overhead loop-able version of :=
. It is particularly useful for repetitively updating rows of certain columns by reference (using a for-loop). set
eliminates the small overhead of the [data.table
-method and is as a result therefore faster.Above explanation is based on this answer by Matt Dowle (creator of data.table) and the help-file ?set
.
Upvotes: 4
Reputation: 102890
A base R solution of using Map()
+cbind()
data <- as.data.table(c(data,`names<-`(Map(function(x) gsub(x,"",data$a),newCols),newCols)))
such that
> data
a b One Two Three
1: OneA 1 A OneA OneA
2: TwoB 2 TwoB B TwoB
3: ThreeC 3 ThreeC ThreeC C
Upvotes: 1
Reputation: 6244
One possible approach could be:
library(data.table)
DT[, (newCols) := lapply(newCols, function(x) sub(x, "", a))][]
#> a b One Two Three
#> 1: OneA 1 A OneA OneA
#> 2: TwoB 2 TwoB B TwoB
#> 3: ThreeC 3 ThreeC ThreeC C
Data
DT <- data.table(a = c("OneA", "TwoB", "ThreeC"), b = c(1, 2, 3))
newCols <- c("One", "Two", "Three")
Upvotes: 6