LyricalStats9
LyricalStats9

Reputation: 55

Avoiding duplicates when using a for loop in R

I have a basic data frame containing 4 letters in a column:

a
b
c
d

I wish to use a nested for-loop to bind each letter with every other letter in a new data frame, but avoid binding one letter to itself and avoid duplicates. So far, I can avoid the former but am having trouble with the latter. My code looks like this:

d <- data.frame(c("a", "b", "c", "d"))
e <- data.frame()

for (j in d[,1]) {
  
  for (i in d[,1]) {
    
    if (j != i) {
      e <- rbind(e, c(j, i))
    }
  }
}

This produces the following:

a b #this row
a c
a d
b a #and this row are duplicates
b c
b d
c a
c b
c d
d a
d b
d c

I wish to use the nested for-loop to produce:

a b
a c
a d
b c
b d
c d

I know that having the for-loop run by moving down one row each time (in data frame d) could potentially work, but I am not sure how to code that. I appreciate any suggestions!

Upvotes: 1

Views: 575

Answers (2)

akrun
akrun

Reputation: 887128

This is a case of combn and this can be done easily without a loop

t(combn(d[[1]], 2))

-output

#     [,1] [,2]
#[1,] "a"  "b" 
#[2,] "a"  "c" 
#[3,] "a"  "d" 
#[4,] "b"  "c" 
#[5,] "b"  "d" 
#[6,] "c"  "d" 

If the OP wanted to use loop, add some conditions

e <- data.frame(col1 = "", col2 = "")

for (j in d[,1]) {  
  for (i in d[,1]) {    
    if (j != i) {
       
         i1 <- !(any((i == e[[1]] & j == e[[2]])))
         
         i2 <- !(any((j %in% e[[1]] && i %in% e[[2]])))
         
         if(i1 & i2) {
          
         e <- rbind(e, c(j, i))
         
    }
  }
}
}

-output

e[-1,]
col1 col2
2    a    b
3    a    c
4    a    d
5    b    c
6    c    d
7    d    b

Upvotes: 4

ATpoint
ATpoint

Reputation: 878

Agree with the suggestion of @akrun. As a rule of thumb, there is almost never the need in R to use loops for any kind of string (or generally any) manipulation of data.

See this speed comparison:

d <- data.frame(c(letters))
e <- data.frame()

solutionCustom <- function(x){
  for (j in d[,1]) {
    for (i in d[,1]) {
      if (j != i) {
        e <- rbind(e, c(j, i))
      }
    }
  }
  e
}

solutionCombn <- function(x) t(combn(d[,1], 2))

library(microbenchmark)

microbenchmark(solutionCustom=solutionCustom(),
               solutionCombn=solutionCombn())

Unit: microseconds
           expr       min        lq       mean     median         uq       max neval
 solutionCustom 44769.620 48898.410 54423.5789 54018.3875 57949.8755 76922.178   100
  solutionCombn   238.311   267.486   294.4763   286.2005   305.8805   605.728   100

The combn solution is about 188x faster and less code-writing-intensive. Whevener you have to use loops in R there is a good chance that you are missing a much more efficient solution.

Upvotes: 1

Related Questions