lemnbalm
lemnbalm

Reputation: 9

R Create Multiple New Variables

I'm having trouble finding a solution for a complex variable assignment need. I need to create new variables based on the values in existing variables, but I need to do the same new variable creation many times using a different set of existing variables each time. So for example, I have the following data frame:

dat=data.frame(
kiwi_3=c(1,0,1),
kiwi_5=c(0,0,1),
kiwi_8=c(1,1,0),

apple_3=c(0,0,0),
apple_5=c(1,0,1),
apple_8=c(1,1,0))

I can use the following line of code to create a new variable based on values in existing variables:

dat<-dat %>%
  mutate(fruit_3=case_when(kiwi_3==1 & apple_3==1~1,
                            kiwi_3==1 & apple_3==0~2))

However, what I need to do is to create variables for each number suffix (in this example, fruit_3, fruit_5, fruit_8) based on the variables with the same (non-sequential) number suffix, using the same logic. The actual logic is more complicated than in this example too so I would guess that using case_when will be necessary. I imagine there's a solution using dplyr mutate and across, possibly with indexed vectors with the variable names in them, but I haven't hit upon a solution that works yet.

Thanks for any suggestions!

Upvotes: 0

Views: 266

Answers (1)

SteveM
SteveM

Reputation: 2301

It may be easier to work with column numbers and offsets. Here is a half-baked example using for loops in which the paired values are simply multiplied together to generate the binary results, (assuming that if kiwi = 0 OR apple = 0 the result = 0:

dat=data.frame(
      kiwi_3=c(1,0,1),
      kiwi_5=c(0,0,1),
      kiwi_8=c(1,1,0),
      apple_3=c(0,0,0),
      apple_5=c(1,0,1),
      apple_8=c(1,1,0))

mat <- matrix(NA, nrow = 3, ncol = 3)

for (i in 1:3) {
      for (j in 1:3) {
      mat[i, j] <- dat[i, j] * dat[i, j+3]
      }
}

cbind(dat, mat)
  kiwi_3 kiwi_5 kiwi_8 apple_3 apple_5 apple_8 1 2 3
1      1      0      1       0       1       1 0 0 1
2      0      0      1       0       0       1 0 0 1
3      1      1      0       0       1       0 0 1 0

Column names would also have to be added to the output matrix. This will work on it's own, but others may have ideas on how to refactor the approach using dplyr.

Upvotes: 0

Related Questions