Reputation: 61
After quite some google effort I hope somebody can help me with the problem, that appears quite simple to me, but is maybe more complicated than I thought:
I have a data.frame with three columns. The first two reflecting all possible combinations of five variables (1-5), the last the "strength" of the combination. I look for the five rows, which include all values of Var1 and Var2 (so values 1-5) and have the highest sum in the strength column. In the example beneath, it is the five rows with a strength of 1000, as they have the highest sum and all five values (1-5) are given in the first two columns.
How do I best approach that problem? Is there a package that has implemented that task? I found now the constrOptim() function, can I do it with that?
Code to create an example dataframe:
a <-cbind(expand.grid(seq(1,5,1),seq(1,5,1)),
strength = c(-11, 61, 230, 118, 156, 98, 169, 306, 6, -54,
207, -32, 27, 128, 101, 19, -18, 32, 153, 14,
63, 136, 165, 73, 35))
a <- a[order(a$strength, decreasing=T),]
Starting dataset:
Var1 Var2 strength
3 2 306
3 1 230
1 3 207
2 2 169
3 5 165
5 1 156
4 4 153
2 5 136
4 3 128
4 1 118
5 3 101
1 2 98
4 5 73
1 5 63
2 1 61
5 5 35
3 4 32
3 3 27
1 4 19
5 4 14
4 2 6
1 1 -11
2 4 -18
2 3 -32
5 2 -54
Not desired outcome:
Var1 Var2 strength
3 2 306
3 1 230
1 3 207
2 2 169
3 5 165
Desired outcome:
Var1 Var2 strength
3 2 306
1 3 207
5 1 156
4 4 153
2 5 136
Upvotes: 0
Views: 985
Reputation: 107697
Consider a series of aggregation and merges between Var1
and Var2
columns:
# MERGE MAX AGGREGATES WHERE Var COL ARE EQUAL AND NOT EQUAL
mergedf1 <- merge(aggregate(strength ~ Var1, data=a[a$Var1==a$Var2,], FUN=max),
a, by=c("Var1", "strength"))
mergedf2 <- merge(aggregate(strength ~ Var1, data=a[a$Var1!=a$Var2,], FUN=max),
a, by=c("Var1", "strength"))
# STACK RESULTS
mergedf <- rbind(mergedf1, mergedf2)
# FINAL MAX AGGREGATION AND MERGE
final <- merge(aggregate(strength ~ Var2, data=mergedf, FUN=max),
mergedf, by=c("Var2", "strength"))
final <- final[,c("Var1", "Var2", "strength")] # SORT COLUMNS
final <- final[with(final, order(-strength)),] # SORT ROWS
# REMOVE TEMP OBJECTS
rm(mergedf1, mergedf2, mergedf)
Upvotes: 1
Reputation: 598
I am not sure the presented solution is the most effective one, but somehow I feel that we must go over the entire dataset to find the unique pairs (for example change the value of (Var1 = 2, Var2 = 5, strength = 136)
to (Var1 = 2, Var2 = 5, strength = 1)
. In order to find the unique pairs I use the apply function. First lets recreate the input:
a <-cbind(expand.grid(seq(1,5,1),seq(1,5,1)),
strength = c(-11, 61, 230, 118, 156, 98, 169, 306, 6, -54,
207, -32, 27, 128, 101, 19, -18, 32, 153, 14,
63, 136, 165, 73, 35))
a <- a[order(a$strength, decreasing=T),]
Now I prepare an empty matrix in which I will keep Var1
in the first column, Var2
in the second and strength
in the third column.
V <- matrix(nrow = 5, ncol = 3)
Next I write a function that will get one row from the ordered dataset a
, will check if Var1
and Var2
are unique and if so, will store strength.
mf <- function(x){
if( !(x[1] %in% V[,1]) & !(x[2] %in% V[,2])) {
i <- x[1]
V[i,1] <<- x[1]
V[i,2] <<- x[2]
V[i,3] <<- x[3]
}
}
Now I apply the function on each row of a
:
apply(a, 1, mf)
The needed values are stored in the matrix V
:
V
[,1] [,2] [,3]
[1,] 1 3 207
[2,] 2 5 136
[3,] 3 2 306
[4,] 4 4 153
[5,] 5 1 156
Sometimes, though going over the full dataset is not necessary (like in the example given), then we would like to be able to break the loop once the unique pairs were found. For that we can use a for
loop. Here is the code:
a <-cbind(expand.grid(seq(1,5,1),seq(1,5,1)),
strength = c(-11, 61, 230, 118, 156, 98, 169, 306, 6, -54,
207, -32, 27, 128, 101, 19, -18, 32, 153, 14,
63, 136, 165, 73, 35))
a <- a[order(a$strength, decreasing=T),]
V <- matrix(nrow=5,ncol=3)
for (i in 1:nrow(a)) {
if( sum(is.na(V[,1])) == 0)
break
if( !(a[i,1] %in% V[,1]) & !(a[i,2] %in% V[,2])) {
j <- a[i,1]
V[j,1] <- a[i,1]
V[j,2] <- a[i,2]
V[j,3] <- a[i,3]
}
}
Hope this helps, or at least will lead to improvements.
Upvotes: 1