Debbie
Debbie

Reputation: 401

Create a new column based on a condition on multiple columns

I have this dataset below:

library(data.table)
set.seed(123)
dt <- data.table(x_1 = c(3,2,2,1,3,2,1,2,3,3),
                 x_2 = c(2,1,1,3,2,3,3,1,2,3),
                 x_3 = c(2,3,3,2,1,2,3,3,1,1),
                 y_1 = sample(2, 10, replace = T),
                 y_2 = sample(2, 10, replace = T),
                 y_3 = sample(2, 10, replace = T))

I would like to perform an if else operation on the x columns and if any of the x columns is equal to 1, it would create a new column with the value of the corresponding y column. For e.g. in the 5th row, x_3 = 1, so the new column should return the value corresponding to y_3.

My idea was to return the name of the x column which matches the condition as an intermediate column and then use the value suffix (1,2,3) to extract the value from the corresponding y column.

But the 1st step to create the intermediate column is creating a list and the rows not matching the condition is returning character(0).

cols <- c("x_1", "x_2", "x_3")
dt$int <- apply(dt[,..cols], 1, function(x) names(which(x == 1)))

My desired output:

x_1 x_2 x_3 y_1 y_2 y_3 new
3   2   2   1   2   2   NA
2   1   3   2   1   2   1
2   1   3   1   2   2   2
1   3   2   2   2   2   2
3   2   1   2   1   2   2
2   3   2   1   2   2   NA
1   3   3   2   1   2   2
2   1   3   2   1   2   1
3   2   1   2   1   1   1
3   3   1   1   2   1   1

Any ideas on how to achieve this? A data-table based solution would be preferable.

Upvotes: 4

Views: 493

Answers (3)

Jaap
Jaap

Reputation: 83215

Another possible solution:

ix <- dt[, max.col(.SD == 1) * NA^(!rowSums(.SD == 1)), .SDcols = 1:3]

dt[, newcol := as.matrix(.SD)[cbind(.I, ix)]
   , .SDcols = 4:6][]

which gives:

    x_1 x_2 x_3 y_1 y_2 y_3 newcol
 1:   3   2   2   1   2   2     NA
 2:   2   1   3   2   1   2      1
 3:   2   1   3   1   2   2      2
 4:   1   3   2   2   2   2      2
 5:   3   2   1   2   1   2      2
 6:   2   3   2   1   2   2     NA
 7:   1   3   3   2   1   2      2
 8:   2   1   3   2   1   2      1
 9:   3   2   1   2   1   1      1
10:   3   3   1   1   2   1      1

Notes:

  • Instead of as.matrix you can also use as.data.frame.
  • If you have more than one x column which is equal to 1, then you need to use the ties.method-parameter of max.col. You can choose between "random", "first" or "last".

If you don't know the column positions beforehand, you can generalize the above solution to:

xcols <- like(names(dt), "x")
ycols <- like(names(dt), "y")

ix <- dt[, max.col(.SD == 1) * NA^(!rowSums(.SD == 1)), .SDcols = xcols]

dt[, newcol := as.matrix(.SD)[cbind(.I, ix)]
   , .SDcols = ycols][]

Upvotes: 2

chinsoon12
chinsoon12

Reputation: 25225

Not sure how you want to handle cases without any 1s or multiple ones in the x_* in the same row.

Here is a possible approach using data.table::melt into a long format, then find the first location 1 among the x_, then access the y_ value

dt[, rn:=.I]
dt[melt(dt, id.vars="rn", meas=list(c("x_1", "x_2", "x_3"), c("y_1", "y_2", "y_3")))[,
    value2[which(value1==1L)[1L]], by=.(rn)], yval := V1, on=.(rn)]

output:

    x_1 x_2 x_3 y_1 y_2 y_3 rn yval
 1:   3   2   2   1   2   2  1   NA
 2:   2   1   3   2   1   2  2    1
 3:   2   1   3   1   2   2  3    2
 4:   1   3   2   2   2   2  4    2
 5:   3   2   1   2   1   2  5    2
 6:   2   3   2   1   2   2  6   NA
 7:   1   3   3   2   1   2  7    2
 8:   2   1   3   2   1   2  8    1
 9:   3   2   1   2   1   1  9    1
10:   3   3   1   1   2   1 10    1

edit: incorporating thelatemail's succinct version and also handling multiple ones as well

dt[, yval := 
    melt(dt, id.vars="rn", measure.vars=patterns("^x_", "^y_"))[value1==1L][
        dt, value2, on=.(rn), mult="first"]
]

Upvotes: 2

akrun
akrun

Reputation: 886938

Here is one option with Map. Subset the subset of data.table (.SD) for the 'x' and 'y' columns, create a logical vector of the 'x' columns and get the corresponding 'y' value where 'x' is 1, and collapse it to a single element with pmin (assuming that 'x columns do not have more than 1 per each row)

dt[, new := do.call(pmin, c(Map(function(x, y) y * NA^(x != 1),
      .SD[, 1:3, with = FALSE], .SD[, 4:6, with = FALSE]), na.rm = TRUE)), ]
dt
#    x_1 x_2 x_3 y_1 y_2 y_3 new
# 1:   3   2   2   1   2   2  NA
# 2:   2   1   3   2   1   2   1
# 3:   2   1   3   1   2   2   2
# 4:   1   3   2   2   2   2   2
# 5:   3   2   1   2   1   2   2
# 6:   2   3   2   1   2   2  NA
# 7:   1   3   3   2   1   2   2
# 8:   2   1   3   2   1   2   1
# 9:   3   2   1   2   1   1   1
#10:   3   3   1   1   2   1   1

Upvotes: 0

Related Questions