Reputation: 45

Pick a column to multiply with, contingent on value of other variables

I am still doing my first footsteps with R and found SO to be a great tool for learning more and finding answers to my questions. For this one i though did not manage to find any good solution here.

I have a dataframe that can be simplified to this structure:

set.seed(10)
df <- data.frame(v1 = rep(1:2, times=3), 
v2 = c("A","B","B","A","B","A"), 
v3 = sample(1:6), 
xA_1 = sample(1:6), 
xA_2 = sample(1:6),
xB_1 = sample(1:6), xB_2 = sample(1:6))

df thus looks like this:

> df 
   v1 v2 v3   xA_1 xA_2 xB_1 xB_2
1  1  A  4    2    1    3    3
2  2  B  2    6    3    5    4
3  1  B  5    3    2    4    5
4  2  A  3    5    4    2    1
5  1  B  1    4    6    6    2
6  2  A  6    1    5    1    6

I now want R to create a fourth variable, which is dependent on the values of v1 and v2. I achieve this by using the following code:

df <- data.table(df)
df[, v4 := ifelse(v1 == 1 & v2 == "A", v3*xA_1, 
        ifelse(v1 == 1 & v2 == "B", v3*xB_1,
         ifelse(v1 == 2 & v2 == "A", v3*xA_2,
          ifelse(v1 == 2 & v2 == "B", v3*xB_2, v3*1))))]

So v4 is created by multiplying v3 with the column that contains the v1 and the v2 value (e.g. for row 1: v1=1 and v2=A thus multiply v3=4 with xA_1=2 -> 8).

> df$v4
[1]  8  8 20 12  6 30

Obviuosly, my ifelse approach is tedious when v1 and v2 in fact have many more different values than they have in this example. So I am looking for an efficient way to tell R if v1 == y & v2 == z, multiply v3 with column xy_z.

I tried writing a for-loop, writing a function that has y and z as index and using the apply function. However none of this worked as wanted.

I appreciate any ideas!

Upvotes: 3

Answers (3)

talat

Reputation: 70256

Here's a base R option:

i <- paste0("x", df$v2, "_", df$v1)
df$v4 <- df$v3 * as.numeric(df[cbind(1:nrow(df), match(i, names(df)))])

For the sample data provided below, it creates a column v4 as:

> df$v4
[1] 25 12  2  6  3 10

Or if you want to include the "else" condition to multiply by 1 in case there's no matching column name:

i <- paste0("x", df$v2, "_", df$v1)
tmp <- as.numeric(df[cbind(1:nrow(df), match(i, names(df)))])
df$v4 <- df$v3 * ifelse(is.na(tmp), 1, tmp)

Sample data:

df <- structure(list(v1 = c(1L, 2L, 1L, 2L, 1L, 2L), v2 = structure(c(1L, 
2L, 2L, 1L, 2L, 1L), .Label = c("A", "B"), class = "factor"), 
    v3 = c(5L, 4L, 1L, 6L, 3L, 2L), xA_1 = c(5L, 6L, 3L, 1L, 
    2L, 4L), xA_2 = c(6L, 4L, 2L, 1L, 3L, 5L), xB_1 = c(4L, 6L, 
    2L, 5L, 1L, 3L), xB_2 = c(5L, 3L, 2L, 4L, 1L, 6L)), .Names = c("v1", 
"v2", "v3", "xA_1", "xA_2", "xB_1", "xB_2"), row.names = c(NA, 
-6L), class = "data.frame")

Upvotes: 2

eddi

Reputation: 49448

This is a standard "wide" table problem - what you want is harder to do as-is, but easy when the data is "melted":

dt = as.data.table(df)

melt(dt, id.vars = c('v1', 'v2', 'v3'))[variable == paste0('x', v2, '_', v1)
   ][dt, on = c('v1', 'v2', 'v3'), v3 * value]
#[1]  8  8 20 12  6 30

Upvotes: 2

F. Kroell

Reputation: 36

You can try this :

v4 <- c()
for(i in 1:nrow(df)){
  col <- paste("x",df$v2[i],"_",df$v1[i],sep="")
  v4 <- c(v4,df$v3[i]*df[i,col])
}

df$v4 <- v4

Upvotes: 0

Pick a column to multiply with, contingent on value of other variables

Answers (3)

Related Questions