Reputation: 55
I have a basic data frame containing 4 letters in a column:
a
b
c
d
I wish to use a nested for-loop to bind each letter with every other letter in a new data frame, but avoid binding one letter to itself and avoid duplicates. So far, I can avoid the former but am having trouble with the latter. My code looks like this:
d <- data.frame(c("a", "b", "c", "d"))
e <- data.frame()
for (j in d[,1]) {
for (i in d[,1]) {
if (j != i) {
e <- rbind(e, c(j, i))
}
}
}
This produces the following:
a b #this row
a c
a d
b a #and this row are duplicates
b c
b d
c a
c b
c d
d a
d b
d c
I wish to use the nested for-loop to produce:
a b
a c
a d
b c
b d
c d
I know that having the for-loop run by moving down one row each time (in data frame d) could potentially work, but I am not sure how to code that. I appreciate any suggestions!
Upvotes: 1
Views: 575
Reputation: 887128
This is a case of combn
and this can be done easily without a loop
t(combn(d[[1]], 2))
-output
# [,1] [,2]
#[1,] "a" "b"
#[2,] "a" "c"
#[3,] "a" "d"
#[4,] "b" "c"
#[5,] "b" "d"
#[6,] "c" "d"
If the OP wanted to use loop, add some conditions
e <- data.frame(col1 = "", col2 = "")
for (j in d[,1]) {
for (i in d[,1]) {
if (j != i) {
i1 <- !(any((i == e[[1]] & j == e[[2]])))
i2 <- !(any((j %in% e[[1]] && i %in% e[[2]])))
if(i1 & i2) {
e <- rbind(e, c(j, i))
}
}
}
}
-output
e[-1,]
col1 col2
2 a b
3 a c
4 a d
5 b c
6 c d
7 d b
Upvotes: 4
Reputation: 878
Agree with the suggestion of @akrun. As a rule of thumb, there is almost never the need in R to use loops for any kind of string (or generally any) manipulation of data.
See this speed comparison:
d <- data.frame(c(letters))
e <- data.frame()
solutionCustom <- function(x){
for (j in d[,1]) {
for (i in d[,1]) {
if (j != i) {
e <- rbind(e, c(j, i))
}
}
}
e
}
solutionCombn <- function(x) t(combn(d[,1], 2))
library(microbenchmark)
microbenchmark(solutionCustom=solutionCustom(),
solutionCombn=solutionCombn())
Unit: microseconds
expr min lq mean median uq max neval
solutionCustom 44769.620 48898.410 54423.5789 54018.3875 57949.8755 76922.178 100
solutionCombn 238.311 267.486 294.4763 286.2005 305.8805 605.728 100
The combn
solution is about 188x faster and less code-writing-intensive. Whevener you have to use loops in R there is a good chance that you are missing a much more efficient solution.
Upvotes: 1