Reputation: 23
I'm relatively new to R and I'm hoping to replace my messy loop with something more eloquent and faster (apply?). Basically, I want to populate a new matrix based on if values in the same position in other matrices match one another. Let me illustrate:
>df1
V1 V2 V3
1 A G A
2 T T T
3 C A A
4 G C G
>df2
V1
1 A
2 T
3 C
4 G
>df3
V1 V2 V3
1 .25 .99 .41
2 .21 .25 .75
3 .35 .65 .55
4 .75 .21 .11
>newdf <- data.frame(matrix(ncol= ncol(df3), nrow = nrow(df3)))
Note that df1 and df3 will always have the same dimensions as one another, and df2 will always have the same nrow.
If positions Match: If df1[i,j] == df2[i], then I want newdf[i,j] = df3[i,j]
If positions don't match: If df1[i,j] != df2[i], then I want newdf[i,j] = 1-df3[i,j]
For instance df1[1,2] = 'G' and df2[1] = 'A', so I want newdf[1,2] = (1- df3[1,2])
I wrote a very gross for loop to perform this successfully:
df1<- as.matrix(df1)
df2<- as.matrix(df2)
df3<- as.matrix(df3)
newdf <- data.frame(matrix(ncol= ncol(df3), nrow = nrow(df3)))
for (i in (1:nrow(df1))){
for (j in (1:ncol(df1))){
if (df1[i,j] == df2[i]) {
newdf[i,j] = df3[i,j] }
else {
newdf[i,j] = 1- df3[i,j] }
}
}
Which gives me the desired results:
>newdf
X1 X2 X3
1 0.25 0.01 0.41
2 0.21 0.25 0.75
3 0.35 0.35 0.45
4 0.75 0.79 0.11
This is a very slow and messy process when I have lots of data. Are there any suggestions for other ways to solve this, perhaps using the apply family? Thanks and sorry for the nasty code.
Upvotes: 2
Views: 949
Reputation: 26248
You can use an apply
to create an index of those values that don't match, then simply subtract them from one
idx <- (!apply(df1, 2, function(x) x == df2))
## alternatively, you can use x != df2 too
## idx <- (apply(df1, 2, function(x) x != df2))
df3[idx] <- 1 - df3[idx]
df3
# V1 V2 V3
# 1 0.25 0.01 0.41
# 2 0.21 0.25 0.75
# 3 0.35 0.35 0.45
# 4 0.75 0.79 0.11
Where the apply
gives a matrix of TRUE/FALSE based on whether df1
matches df2
V1 V2 V3
[1,] TRUE FALSE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE FALSE FALSE
[4,] TRUE FALSE TRUE
So taking the negation of this using !
gives the opposite values.
!apply(df1, 2, function(x) x == df2)
V1 V2 V3
[1,] FALSE TRUE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE TRUE
[4,] FALSE TRUE FALSE
which then tells us which values of df
we need to change
df3[idx]
[1] 0.01 0.35 0.79 0.45
And alternative is to make df2
the same size as df1
df2 <- cbind(df2, rep( df2, ncol( df1 ) - 1))
df1 != df2
Upvotes: 1