R - Match column by other columns in same data frame

Question

In the following dataframe I want to create a new column called D2 that matches the corresponding A, B, or C column. For example, if D == A, I want D2 == A2.

A   A2  B   B2  C   C2  D
1   10  2   90  3   9   1
1   11  2   99  3   15  1
1   42  2   2   3   9   2
1   5   2   54  3   235 2
1   13  2   20  3   10  3
1   6   2   1   3   4   3

This is what I want the new data frame to look like:

A   A2  B   B2  C   C2  D   D2
1   10  2   90  3   9   1   10
1   11  2   99  3   15  1   11
1   42  2   2   3   9   2   2
1   5   2   54  3   235 2   54
1   13  2   20  3   10  3   10
1   6   2   1   3   4   3   4

I have succeeded in doing this with ifelse statements using dplyr, but because I am doing this with many columns, it gets tedious after a while. I was wondering if there is a more clever way to accomplish the same task.

library(dplyr)  

newdata <- olddata %>% mutate(D2=ifelse(D==A,A2,ifelse(D==B,B2,C2)))

akrun · Accepted Answer

We can do this efficiently with max.col from base R. Subset the 'olddata' with only 'A', 'B', 'C' columns ('d1'), check whether it is equal to 'D' (after replicating the 'D' to match the lengths), use max.col to find the index of the maximum element (i.e TRUE in this case, assuming that there will be a single TRUE value per rows), multiply by 2 as the 'A1', 'B2', 'C2' columns are alternating after 'A', 'B', 'C', cbind with the row sequence to create the row/column index and extract the elements based on that to create the 'D2' column.

d1 <- olddata[c("A", "B", "C")]
olddata$D2 <- olddata[cbind(1:nrow(d1), max.col(d1 == rep(olddata["D"], 
          ncol(d1)), "first")*2)]
olddata$D2
#[1] 10 11  2 54 10  4

A slightly different approach would be to compare the columns separately in a loop using lapply (should be efficient if the dataset is very big as converting to a big logical matrix can cost the memory) and based on that we subset the corresponding columns of A2, B2, C2 with mapply

i1 <- grep("^[^D]", names(olddata)) #create an index for columns that are not D
i2 <- seq(1, ncol(olddata[i1]), by = 2)#for subsetting A, B, C
i3 <- seq(2, ncol(olddata[i1]), by = 2)# for subsetting A2, B2, C2
olddata$D2 <- c(mapply(`[`, olddata[i3], lapply(olddata[i2], `==`, olddata$D)))
olddata$D2
[1] 10 11  2 54 10  4

R - Match column by other columns in same data frame

Answers (1)

Related Questions