Ernest Aigner
Ernest Aigner

Reputation: 61

R Find best set of all possible combinations that includes all values

After quite some google effort I hope somebody can help me with the problem, that appears quite simple to me, but is maybe more complicated than I thought:

I have a data.frame with three columns. The first two reflecting all possible combinations of five variables (1-5), the last the "strength" of the combination. I look for the five rows, which include all values of Var1 and Var2 (so values 1-5) and have the highest sum in the strength column. In the example beneath, it is the five rows with a strength of 1000, as they have the highest sum and all five values (1-5) are given in the first two columns.

How do I best approach that problem? Is there a package that has implemented that task? I found now the constrOptim() function, can I do it with that?

Code to create an example dataframe:

a <-cbind(expand.grid(seq(1,5,1),seq(1,5,1)),
          strength = c(-11, 61, 230, 118, 156, 98, 169, 306, 6, -54,   
                        207, -32, 27, 128, 101, 19, -18, 32, 153, 14, 
                        63, 136, 165, 73, 35))
a <- a[order(a$strength, decreasing=T),]

Starting dataset:

 Var1 Var2 strength
 3    2      306
 3    1      230
 1    3      207
 2    2      169
 3    5      165
 5    1      156
 4    4      153
 2    5      136
 4    3      128
 4    1      118
 5    3      101
 1    2       98
 4    5       73
 1    5       63
 2    1       61
 5    5       35
 3    4       32
 3    3       27
 1    4       19
 5    4       14
 4    2        6
 1    1      -11
 2    4      -18
 2    3      -32
 5    2      -54

Not desired outcome:

 Var1 Var2 strength
 3    2      306
 3    1      230
 1    3      207
 2    2      169
 3    5      165

Desired outcome:

 Var1 Var2 strength
 3    2      306
 1    3      207
 5    1      156
 4    4      153
 2    5      136

Upvotes: 0

Views: 985

Answers (2)

Parfait
Parfait

Reputation: 107697

Consider a series of aggregation and merges between Var1 and Var2 columns:

# MERGE MAX AGGREGATES WHERE Var COL ARE EQUAL AND NOT EQUAL
mergedf1 <- merge(aggregate(strength ~ Var1, data=a[a$Var1==a$Var2,], FUN=max), 
                  a, by=c("Var1", "strength"))
mergedf2 <- merge(aggregate(strength ~ Var1, data=a[a$Var1!=a$Var2,], FUN=max), 
                  a, by=c("Var1", "strength"))

# STACK RESULTS 
mergedf <- rbind(mergedf1, mergedf2)

# FINAL MAX AGGREGATION AND MERGE
final <- merge(aggregate(strength ~ Var2, data=mergedf, FUN=max), 
               mergedf, by=c("Var2", "strength"))
final <- final[,c("Var1", "Var2", "strength")]                    # SORT COLUMNS
final <- final[with(final, order(-strength)),]                    # SORT ROWS

# REMOVE TEMP OBJECTS
rm(mergedf1, mergedf2, mergedf)

Upvotes: 1

Tal J. Levy
Tal J. Levy

Reputation: 598

I am not sure the presented solution is the most effective one, but somehow I feel that we must go over the entire dataset to find the unique pairs (for example change the value of (Var1 = 2, Var2 = 5, strength = 136) to (Var1 = 2, Var2 = 5, strength = 1). In order to find the unique pairs I use the apply function. First lets recreate the input:

a <-cbind(expand.grid(seq(1,5,1),seq(1,5,1)),
          strength = c(-11, 61, 230, 118, 156, 98, 169, 306, 6, -54,   
                        207, -32, 27, 128, 101, 19, -18, 32, 153, 14, 
                        63, 136, 165, 73, 35))
a <- a[order(a$strength, decreasing=T),]

Now I prepare an empty matrix in which I will keep Var1 in the first column, Var2 in the second and strength in the third column.

V <- matrix(nrow = 5, ncol = 3)

Next I write a function that will get one row from the ordered dataset a, will check if Var1 and Var2 are unique and if so, will store strength.

mf <- function(x){
    if( !(x[1] %in% V[,1]) & !(x[2] %in% V[,2])) {
        i <- x[1]
        V[i,1] <<- x[1]
        V[i,2] <<- x[2]
        V[i,3] <<- x[3]
    }
}

Now I apply the function on each row of a:

apply(a, 1, mf)

The needed values are stored in the matrix V:

V
     [,1] [,2] [,3]
[1,]    1    3  207
[2,]    2    5  136
[3,]    3    2  306
[4,]    4    4  153
[5,]    5    1  156

Sometimes, though going over the full dataset is not necessary (like in the example given), then we would like to be able to break the loop once the unique pairs were found. For that we can use a for loop. Here is the code:

a <-cbind(expand.grid(seq(1,5,1),seq(1,5,1)),
          strength = c(-11, 61, 230, 118, 156, 98, 169, 306, 6, -54,   
                       207, -32, 27, 128, 101, 19, -18, 32, 153, 14, 
                       63, 136, 165, 73, 35))
a <- a[order(a$strength, decreasing=T),]

V <- matrix(nrow=5,ncol=3)
for (i in 1:nrow(a)) {
    if( sum(is.na(V[,1])) == 0)
        break
    if( !(a[i,1] %in% V[,1]) & !(a[i,2] %in% V[,2])) {
        j <- a[i,1]
        V[j,1] <- a[i,1]
        V[j,2] <- a[i,2]
        V[j,3] <- a[i,3]
    }
}

Hope this helps, or at least will lead to improvements.

Upvotes: 1

Related Questions