M. Goodge
M. Goodge

Reputation: 11

How to compare all values in one vector to all values in another? (In minimal time)

I have the following r code:

list1 <- c(5, 6, 8, 10, 15, 26, 75) 
list2 <- c(3, 6, 8, 10, 100, 42)

total <- length(list1)*length(list2)
  for(x in 1:length(list1)) {
    for(y in 1:length(list2)) {
      print(total - (x*y))
      if(list1[x]>list2[y]) {
        l1Bigger <- l1Bigger + 1
      } else if(list1[x]<list2[y]) {
        l2Bigger <- l2Bigger + 1
      } else {
        tie <- tie + 1
      }
    }
  }
  percents <- c(l1Bigger/total, l2Bigger/total, tie/total)
  return(percents)

Basically, what I want my code to do is iterate through list1 and list2 and compare the values to figure out how often the values in list1 are greater than the values in list 2. My current method takes a lot of time, is there any way to reduce the amount of time this process takes?

Thank you!

Upvotes: 1

Views: 49

Answers (2)

SymbolixAU
SymbolixAU

Reputation: 26248

You can convert what you've got into Rcpp which should speed up the process on long vectors

library(Rcpp)

set.seed(1)
v1 <- rnorm(10000)
v2 <- rnorm(10000)

cppFunction('NumericVector compareVectors(NumericVector v1, NumericVector v2){

            NumericVector out(3);

            for(int i = 0; i < v1.size(); i++){
               for(int j = 0; j < v2.size(); j++){
                  if(v1[i] == v2[j]){
                     out[0]++;
                  }else if(v1[i] < v2[j]){
                     out[1]++;
                  }else{
                     out[2]++;
                  }
               }
            }
            return out;
        }')

compareVectors(v1, v2)
[1]          0 5008309906 4991690094

which shows favourable results when benchmarked

library(microbenchmark)

set.seed(1)
v1 <- rnorm(1000)
v2 <- rnorm(1000)

microbenchmark(

    rcpp = {
        compareVectors(v1, v2)
    },
    exg = {
        g <- expand.grid(v1, v2)
        x.bigger <- sum(g$Var1 > g$Var2)
        y.bigge <- sum(g$Var1 < g$Var2)
    }
)

# Unit: milliseconds
# expr       min        lq      mean    median        uq        max neval
# rcpp  5.600956  5.788145  6.036816  5.927468  6.183143   8.385282   100
#  exg 28.529272 35.246216 41.328205 36.000421 37.653801 540.850561   100

Upvotes: 2

John Coleman
John Coleman

Reputation: 51998

expand.grid is a natural way to do this sort of thing:

> x <- c(2,4,5,1,3)
> y <- c(1,6,2,3)
> g <- expand.grid(x,y)
> x.bigger <- sum(g$Var1 > g$Var2)
> y.bigger <- sum(g$Var1 < g$Var2)
> ties <- sum(g$Var1 == g$Var2)
> x.bigger
[1] 9
> y.bigger <- sum(g$Var1 < g$Var2)
> ties
[1] 3

Of course, ties can just be computed via simple arithmetic from the other two values, but I wanted to show how you could get all three numbers directly.

Upvotes: 2

Related Questions