Reputation: 65
In the following dataframe I want to create a new column called D2 that matches the corresponding A, B, or C column. For example, if D == A, I want D2 == A2.
A A2 B B2 C C2 D
1 10 2 90 3 9 1
1 11 2 99 3 15 1
1 42 2 2 3 9 2
1 5 2 54 3 235 2
1 13 2 20 3 10 3
1 6 2 1 3 4 3
This is what I want the new data frame to look like:
A A2 B B2 C C2 D D2
1 10 2 90 3 9 1 10
1 11 2 99 3 15 1 11
1 42 2 2 3 9 2 2
1 5 2 54 3 235 2 54
1 13 2 20 3 10 3 10
1 6 2 1 3 4 3 4
I have succeeded in doing this with ifelse statements using dplyr, but because I am doing this with many columns, it gets tedious after a while. I was wondering if there is a more clever way to accomplish the same task.
library(dplyr)
newdata <- olddata %>% mutate(D2=ifelse(D==A,A2,ifelse(D==B,B2,C2)))
Upvotes: 2
Views: 894
Reputation: 886938
We can do this efficiently with max.col
from base R
. Subset the 'olddata' with only 'A', 'B', 'C' columns ('d1'), check whether it is equal to 'D' (after replicating the 'D' to match the lengths), use max.col
to find the index of the maximum element (i.e TRUE in this case, assuming that there will be a single TRUE value per rows), multiply by 2 as the 'A1', 'B2', 'C2' columns are alternating after 'A', 'B', 'C', cbind
with the row sequence to create the row/column index and extract the elements based on that to create the 'D2' column.
d1 <- olddata[c("A", "B", "C")]
olddata$D2 <- olddata[cbind(1:nrow(d1), max.col(d1 == rep(olddata["D"],
ncol(d1)), "first")*2)]
olddata$D2
#[1] 10 11 2 54 10 4
A slightly different approach would be to compare the columns separately in a loop using lapply
(should be efficient if the dataset is very big as converting to a big logical matrix
can cost the memory) and based on that we subset the corresponding columns of A2, B2, C2 with mapply
i1 <- grep("^[^D]", names(olddata)) #create an index for columns that are not D
i2 <- seq(1, ncol(olddata[i1]), by = 2)#for subsetting A, B, C
i3 <- seq(2, ncol(olddata[i1]), by = 2)# for subsetting A2, B2, C2
olddata$D2 <- c(mapply(`[`, olddata[i3], lapply(olddata[i2], `==`, olddata$D)))
olddata$D2
[1] 10 11 2 54 10 4
Upvotes: 1