Get the column names for given values in each row

Question

I need to create a new column in my data table that contains a string of all the column names depending on the values within each row.

In the example below I only want to return the column names where all the values in the row are > 1 but this may change. The column names will be used as a formula in a regression model.

The reason for this is I want to build a model per group but within each group some of the variables have zero variance. Some are also categorical/factors but only have 1 level so I can exclude these variables from the model based on the column names.

example data table with the result column being the required output.

dt <- data.table(dept = c("a", "b", "c", "d", "e"), 
             x1 = c(1,2,3,4,5), 
             x2 = c(5,4,3,2,1),
             Result = c("x1", "x1 + x2", "x1 + x2", "x1 + x2", "x2"))

David Arenburg · Accepted Answer

I think this is a nice use case for the set function in order to avoid by-row operations. This will look like a lot of code but this should be efficient for a big data set

## Create an empty "Res" column
set(dt, j = "Res", value = "")

## Loop though c("x1", "x2") columns and update the "Res" column
for (j in c("x1", "x2")) {
  indx <- which(dt[[j]] > 1)
  set(dt, i = indx, j = "Res", value = paste(dt[["Res"]][indx], j, sep = " + "))
}

## Get rid of leading `+`
set(dt, j = "Res", value = sub(" + ", "", dt[["Res"]], fixed = TRUE))

dt
#    dept x1 x2  Result     Res
# 1:    a  1  5      x1      x2
# 2:    b  2  4 x1 + x2 x1 + x2
# 3:    c  3  3 x1 + x2 x1 + x2
# 4:    d  4  2 x1 + x2 x1 + x2
# 5:    e  5  1      x2      x1

Get the column names for given values in each row

Answers (2)

Related Questions