Reputation: 35
I have two large vectors of size ~ 100K with integer data in them e.g 0,1,2,3...70. I want to compare these two vectors element by element with multiple conditions and put a value in 3rd vector bases on the condition. If I loop through this using a for loop and multiple if statements, it takes about 5 hours to run on a good power cluster. Is there a way I can speed it up or achieve the results without looping through?
Thanks.
Example:
A <- c(3,0,1,0,6,1,10,5,1,8,1,4) # 12 elements each
B <- c(1,0,5,1,0,2,2,4,0,1,2,10)
Conditions:
if(A[i]==1 && B[i]==1)
{
C[i] <- "Alpha"
}
if(A[i]>=1 || B[i]>=1)
{
if(A[i]>1 || B[i]>1)
{
C[i] <- "Bravo"
}
}
if(A[i]==0 || B[i]==0)
{
if(A[i]>=1 || B[i]>=1)
{
C[i] <- "Charlie"
}
}
if(A[i]==0 && B[i]==0)
{
C[i] <- "Delta"
}
Upvotes: 0
Views: 522
Reputation: 7941
R is most efficient when you work with whole vectors at once, and let the underlying fortran/C take care of optimisation. So you could try something like:
C <- rep("Alpha",length(A))
C[(A>=1 | B>=1) & (A>1 | B>1)] <- "Bravo"
C[(A==0 | B==0) & (A>=1 | B>=1)] <- "Charlie"
C[A==0 & B==0] <- "Delta"
note |
and &
are vectorised versions of ||
and &&
that compare elementwise (help is at ?'|'
)
Upvotes: 2
Reputation: 78792
I ran your for
loop version and the results match the following:
A <- c(3,0,1,0,6,1,10,5,1,8,1,4) # 12 elements each
B <- c(1,0,5,1,0,2,2,4,0,1,2,10)
C <- ifelse((A==1 & B==1), "Alpha",
ifelse((A==0 | B==0) & (A>=1 | B>=1), "Charlie",
ifelse((A>=1 | B>=1) & (A>1 | B>1), "Bravo",
ifelse(A==0 & B==0, "Delta", NA))))
C
## [1] "Bravo" "Delta" "Bravo" "Charlie" "Charlie" "Bravo" "Bravo" "Bravo" "Charlie" "Bravo"
## [11] "Bravo" "Bravo"
There's definitely a speed improvement, too:
set.seed(1492)
A <- sample(0:10, 100000, replace=TRUE)
B <- sample(0:10, 100000, replace=TRUE)
system.time(C <- ifelse((A==1 & B==1), "Alpha",
ifelse((A==0 | B==0) & (A>=1 | B>=1), "Charlie",
ifelse((A>=1 | B>=1) & (A>1 | B>1), "Bravo",
ifelse(A==0 & B==0, "Delta", NA)))))
## user system elapsed
## 0.350 0.004 0.354
The reason for the single &
and |
operators is straight from the R help:
& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.
Upvotes: 2