Replace a complex, conditional for loop with apply in R

Question

I'm relatively new to R and I'm hoping to replace my messy loop with something more eloquent and faster (apply?). Basically, I want to populate a new matrix based on if values in the same position in other matrices match one another. Let me illustrate:

>df1
  V1 V2 V3
1  A  G  A
2  T  T  T
3  C  A  A
4  G  C  G

 >df2
   V1
1  A
2  T
3  C
4  G

>df3
    V1   V2   V3
1  .25  .99  .41
2  .21  .25  .75
3  .35  .65  .55
4  .75  .21  .11

>newdf <- data.frame(matrix(ncol= ncol(df3), nrow = nrow(df3)))

Note that df1 and df3 will always have the same dimensions as one another, and df2 will always have the same nrow.

If positions Match: If df1[i,j] == df2[i], then I want newdf[i,j] = df3[i,j]

If positions don't match: If df1[i,j] != df2[i], then I want newdf[i,j] = 1-df3[i,j]

For instance df1[1,2] = 'G' and df2[1] = 'A', so I want newdf[1,2] = (1- df3[1,2])

I wrote a very gross for loop to perform this successfully:

df1<- as.matrix(df1)
df2<- as.matrix(df2)
df3<- as.matrix(df3)
newdf <- data.frame(matrix(ncol= ncol(df3), nrow = nrow(df3)))

for (i in (1:nrow(df1))){
  for (j in (1:ncol(df1))){
      if (df1[i,j] == df2[i]) {
        newdf[i,j] = df3[i,j] }
      else {
       newdf[i,j] = 1- df3[i,j] }
   }
 }

Which gives me the desired results:

>newdf
    X1   X2   X3
1 0.25 0.01 0.41
2 0.21 0.25 0.75
3 0.35 0.35 0.45
4 0.75 0.79 0.11

This is a very slow and messy process when I have lots of data. Are there any suggestions for other ways to solve this, perhaps using the apply family? Thanks and sorry for the nasty code.

SymbolixAU · Accepted Answer

You can use an apply to create an index of those values that don't match, then simply subtract them from one

idx <- (!apply(df1, 2, function(x) x == df2))
## alternatively, you can use x != df2 too
## idx <- (apply(df1, 2, function(x) x != df2))

df3[idx] <- 1 - df3[idx]
df3

#     V1   V2   V3
# 1 0.25 0.01 0.41
# 2 0.21 0.25 0.75
# 3 0.35 0.35 0.45
# 4 0.75 0.79 0.11

Explanation

Where the apply gives a matrix of TRUE/FALSE based on whether df1 matches df2

       V1    V2    V3
[1,] TRUE FALSE  TRUE
[2,] TRUE  TRUE  TRUE
[3,] TRUE FALSE FALSE
[4,] TRUE FALSE  TRUE

So taking the negation of this using ! gives the opposite values.

!apply(df1, 2, function(x) x == df2)
        V1    V2    V3
[1,] FALSE  TRUE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE  TRUE  TRUE
[4,] FALSE  TRUE FALSE

which then tells us which values of df we need to change

df3[idx]
[1] 0.01 0.35 0.79 0.45

And alternative is to make df2 the same size as df1

df2  <- cbind(df2, rep( df2, ncol( df1 ) - 1))

df1 != df2

Replace a complex, conditional for loop with apply in R

Answers (1)

Explanation

Related Questions