Reputation: 938
I need to create a new column in my data table that contains a string of all the column names depending on the values within each row.
In the example below I only want to return the column names where all the values in the row are > 1 but this may change. The column names will be used as a formula in a regression model.
The reason for this is I want to build a model per group but within each group some of the variables have zero variance. Some are also categorical/factors but only have 1 level so I can exclude these variables from the model based on the column names.
example data table with the result column being the required output.
dt <- data.table(dept = c("a", "b", "c", "d", "e"),
x1 = c(1,2,3,4,5),
x2 = c(5,4,3,2,1),
Result = c("x1", "x1 + x2", "x1 + x2", "x1 + x2", "x2"))
Upvotes: 3
Views: 250
Reputation: 92300
I think this is a nice use case for the set
function in order to avoid by-row operations. This will look like a lot of code but this should be efficient for a big data set
## Create an empty "Res" column
set(dt, j = "Res", value = "")
## Loop though c("x1", "x2") columns and update the "Res" column
for (j in c("x1", "x2")) {
indx <- which(dt[[j]] > 1)
set(dt, i = indx, j = "Res", value = paste(dt[["Res"]][indx], j, sep = " + "))
}
## Get rid of leading `+`
set(dt, j = "Res", value = sub(" + ", "", dt[["Res"]], fixed = TRUE))
dt
# dept x1 x2 Result Res
# 1: a 1 5 x1 x2
# 2: b 2 4 x1 + x2 x1 + x2
# 3: c 3 3 x1 + x2 x1 + x2
# 4: d 4 2 x1 + x2 x1 + x2
# 5: e 5 1 x2 x1
Upvotes: 3
Reputation: 8413
library(data.table)
dt[, new_col := paste0(colnames(.SD)[.SD>1], collapse = "+"), 1:nrow(dt), .SDcols = c("x1", "x2")]
#dt
# dept x1 x2 Result new_col
#1: a 1 5 x1 x2
#2: b 2 4 x1 + x2 x1+x2
#3: c 3 3 x1 + x2 x1+x2
#4: d 4 2 x1 + x2 x1+x2
#5: e 5 1 x2 x1
Upvotes: 1