Reputation: 401
I have this dataset below:
library(data.table)
set.seed(123)
dt <- data.table(x_1 = c(3,2,2,1,3,2,1,2,3,3),
x_2 = c(2,1,1,3,2,3,3,1,2,3),
x_3 = c(2,3,3,2,1,2,3,3,1,1),
y_1 = sample(2, 10, replace = T),
y_2 = sample(2, 10, replace = T),
y_3 = sample(2, 10, replace = T))
I would like to perform an if else operation on the x
columns and if any of the x columns is equal to 1, it would create a new column with the value of the corresponding y column. For e.g. in the 5th row, x_3 = 1
, so the new column should return the value corresponding to y_3.
My idea was to return the name of the x column which matches the condition as an intermediate column and then use the value suffix (1,2,3) to extract the value from the corresponding y column.
But the 1st step to create the intermediate column is creating a list and the rows not matching the condition is returning character(0)
.
cols <- c("x_1", "x_2", "x_3")
dt$int <- apply(dt[,..cols], 1, function(x) names(which(x == 1)))
My desired output:
x_1 x_2 x_3 y_1 y_2 y_3 new
3 2 2 1 2 2 NA
2 1 3 2 1 2 1
2 1 3 1 2 2 2
1 3 2 2 2 2 2
3 2 1 2 1 2 2
2 3 2 1 2 2 NA
1 3 3 2 1 2 2
2 1 3 2 1 2 1
3 2 1 2 1 1 1
3 3 1 1 2 1 1
Any ideas on how to achieve this? A data-table based solution would be preferable.
Upvotes: 4
Views: 493
Reputation: 83215
Another possible solution:
ix <- dt[, max.col(.SD == 1) * NA^(!rowSums(.SD == 1)), .SDcols = 1:3]
dt[, newcol := as.matrix(.SD)[cbind(.I, ix)]
, .SDcols = 4:6][]
which gives:
x_1 x_2 x_3 y_1 y_2 y_3 newcol 1: 3 2 2 1 2 2 NA 2: 2 1 3 2 1 2 1 3: 2 1 3 1 2 2 2 4: 1 3 2 2 2 2 2 5: 3 2 1 2 1 2 2 6: 2 3 2 1 2 2 NA 7: 1 3 3 2 1 2 2 8: 2 1 3 2 1 2 1 9: 3 2 1 2 1 1 1 10: 3 3 1 1 2 1 1
Notes:
as.matrix
you can also use as.data.frame
.ties.method
-parameter of max.col
. You can choose between "random"
, "first"
or "last"
.If you don't know the column positions beforehand, you can generalize the above solution to:
xcols <- like(names(dt), "x")
ycols <- like(names(dt), "y")
ix <- dt[, max.col(.SD == 1) * NA^(!rowSums(.SD == 1)), .SDcols = xcols]
dt[, newcol := as.matrix(.SD)[cbind(.I, ix)]
, .SDcols = ycols][]
Upvotes: 2
Reputation: 25225
Not sure how you want to handle cases without any 1s or multiple ones in the x_* in the same row.
Here is a possible approach using data.table::melt
into a long format, then find the first location 1 among the x_, then access the y_ value
dt[, rn:=.I]
dt[melt(dt, id.vars="rn", meas=list(c("x_1", "x_2", "x_3"), c("y_1", "y_2", "y_3")))[,
value2[which(value1==1L)[1L]], by=.(rn)], yval := V1, on=.(rn)]
output:
x_1 x_2 x_3 y_1 y_2 y_3 rn yval
1: 3 2 2 1 2 2 1 NA
2: 2 1 3 2 1 2 2 1
3: 2 1 3 1 2 2 3 2
4: 1 3 2 2 2 2 4 2
5: 3 2 1 2 1 2 5 2
6: 2 3 2 1 2 2 6 NA
7: 1 3 3 2 1 2 7 2
8: 2 1 3 2 1 2 8 1
9: 3 2 1 2 1 1 9 1
10: 3 3 1 1 2 1 10 1
edit: incorporating thelatemail's succinct version and also handling multiple ones as well
dt[, yval :=
melt(dt, id.vars="rn", measure.vars=patterns("^x_", "^y_"))[value1==1L][
dt, value2, on=.(rn), mult="first"]
]
Upvotes: 2
Reputation: 886938
Here is one option with Map
. Subset the subset of data.table (.SD
) for the 'x' and 'y' columns, create a logical vector of the 'x' columns and get the corresponding 'y' value where 'x' is 1, and collapse it to a single element with pmin
(assuming that 'x columns do not have more than 1 per each row)
dt[, new := do.call(pmin, c(Map(function(x, y) y * NA^(x != 1),
.SD[, 1:3, with = FALSE], .SD[, 4:6, with = FALSE]), na.rm = TRUE)), ]
dt
# x_1 x_2 x_3 y_1 y_2 y_3 new
# 1: 3 2 2 1 2 2 NA
# 2: 2 1 3 2 1 2 1
# 3: 2 1 3 1 2 2 2
# 4: 1 3 2 2 2 2 2
# 5: 3 2 1 2 1 2 2
# 6: 2 3 2 1 2 2 NA
# 7: 1 3 3 2 1 2 2
# 8: 2 1 3 2 1 2 1
# 9: 3 2 1 2 1 1 1
#10: 3 3 1 1 2 1 1
Upvotes: 0