Reputation: 2583
I have two data.frames that looks like:
df1 Gene name sample1 sample2 sample3 sample4 sample5 A 0 1 0 0 1 B 1 0 0 1 0 C 0 0 1 1 1 D 1 0 0 1 0 df_final Gene name sample1 sample2 sample3 sample4 sample5 A 1 1 1 0 0 B 0 1 0 0 0 C 1 1 0 0 0 D 1 1 0 0 0
Only values of "0" and "1" are present. I would like a single data.frame in which when an entry in df1 or df2 is == 1 in both data.frames it will be maintained as "1" (the same with "0"). Otherwise, when it is == 1 in one data.frame (df1 for example) and 0 in the other data.frame (df2 for example) the entry will become 1. The two data.frames have the same number of rows and the same number of columns.
The desired output will be:
df1 Gene name sample1 sample2 sample3 sample4 sample5 A 1 1 1 0 1 B 1 1 0 1 0 C 1 1 1 1 1 D 1 1 0 1 0
Since I' m new in R I would like to use for loops on the first and second data.frame to learn to loop over multiple data.frames. At the moment I'm not able to do such work. Can anyone help me please?
Best,
E.
Upvotes: 1
Views: 906
Reputation: 1144
What you want is known as a bitwise OR operation: https://en.wikipedia.org/wiki/Bitwise_operation#OR
There are functions for bitwise operations in R 3.0: bitwAnd, bitwNot, bitwOr, bitwShiftL, bitwShiftR and bitwXor (bitwOr is the one you are looking for).
The answer joran gave works fine, but if you are running R 3.0 I would suggest using bitwise operations, since they tend to work faster:
> system.time(for (i in 1:10000) {df3[,-1] <- ((df1[,-1] + df2[,-1]) > 0) + 0})
user system elapsed
13.58 0.00 13.59
> system.time(for (i in 1:10000) {df3[,-1] = bitwOr(unlist(df1[,-1]), unlist(df2[,-1]))})
user system elapsed
5.44 0.00 5.45
Upvotes: 3
Reputation: 4807
Short way: #df3 <- as.integer(df1+df2>0)
#this was wrong
EDIT Short way: df3 <- apply(df1+df2>0, c(1,2), as.integer)
#there might be shorter
With loops etc:
df3 <- as.data.frame(matrix(rep(NA, nrow(df1)*ncol(df1)),ncol=ncol(df1))
names(df3) <- names(df1)
for(i in 1:ncol(df1)){
for(j in 1:nrow(df1)){
if(i==1){#edited
df3[j,i] <- df1[j,i]#edited; note, this is dangerous b/c it is assuming the data frames are organized in the same way
}else{#edited
df3[j,i] <- as.integer((df1[j,i] + df2[j,i])>0)
}#edited
}
}
That work?
Upvotes: 1
Reputation: 173627
The "R" way to do this sort of thing is to take advantage of vectorization:
df3 <- df1
> df3[,-1] <- ((df1[,-1] + df2[,-1]) > 0) + 0
> df3
Genename sample1 sample2 sample3 sample4 sample5
1 A 1 1 1 0 1
2 B 1 1 0 1 0
3 C 1 1 1 1 1
4 D 1 1 0 1 0
The loops are still happening, but under the hood, in much faster compiled code.
A brief explanation:
We can add the numeric portions of the two data frames in a vectorized fashion:
(df1[,-1] + df2[,-1])
sample1 sample2 sample3 sample4 sample5
1 1 2 1 0 1
2 1 1 0 1 0
3 1 1 1 1 1
4 2 1 0 1 0
Then if we ask which values are greater than zero we get the "right" answer, but in booleans instead of 0's and 1's:
> (df1[,-1] + df2[,-1]) > 0
sample1 sample2 sample3 sample4 sample5
[1,] TRUE TRUE TRUE FALSE TRUE
[2,] TRUE TRUE FALSE TRUE FALSE
[3,] TRUE TRUE TRUE TRUE TRUE
[4,] TRUE TRUE FALSE TRUE FALSE
Luckily, if we simply add 0, R will coerce the booleans back to integers:
> ((df1[,-1] + df2[,-1]) > 0) + 0
sample1 sample2 sample3 sample4 sample5
[1,] 1 1 1 0 1
[2,] 1 1 0 1 0
[3,] 1 1 1 1 1
[4,] 1 1 0 1 0
Upvotes: 3