Reputation: 383
This problem is best addressed by example.
Setup
Mat1 <- matrix(nrow =9, ncol =11)
colnames(Mat1) <- c("Name", "Strategy.Assets", "Jan.94", "Jan.95", "Jan.96", "Jan.97", "1", "2", "3","4", "5")
Mat1[,1] <- letters[1:9]
Mat1[,2] <- cbind(20,30,40,50,60,30,30,40,50)
Mat1[,3:6] <- rnorm(36,0,1)
Mat1[,7] <- c(0,0,0,0,0,0,0,0,0)
Mat1[,8] <- c(0.95, 0.8, 0,0,0,0,0,0,0)
Mat1[,9] <- c(0.95,0.6,0.7,0,0,0,0,0,0)
Mat1[,10] <- c(0.08, 0.09, 0.99, 0.93, 0,0,0,0,0)
Mat1[,11] <- c(0.09,0.01,0.04,0.97, 0.80,0,0,0,0)
data.frame(Mat1)
treat columns "1" "2" "3" "4" "5" "6" as the beginning of the correlation matrix (it should be 9 by 9 but i am only showing the first 6 columns).
For each row in columns 1 to 6, i need to identify if there is a value >= 0.95. If there is, i need to know the position. In this case the value 0.95 appears in entry m= 1, n=2 of the correlation matrix. I then need to go to the column "strategy assets" and compare the values in rows 1 and 2 (in this case 20 and 30). After this, I need to omit the row with the lower value (row 1 as 20 is less than 30). After removing one row, I need to proceed to the next row in the correlation matrix - we see that in row 2, there are no values that are at least equal to 0.95. We then proceed to the third row. We then see 0.99 >0.95 occurs in position m=3, n=4. We then go to "strategy assets" and compare values in row 3 and 4. In row 3 strategy assets = 40, and row 4 =50. Thus we omit row 3. We then proceed to row 4 of the correlation matrix. in position 4,5 0.99 occurs. We then need to compare rows 4 and 5 in strategy assets. We see that row 5 =60, and row 4=50 and so need to omit row4.
I need to repeat this process for all rows in the correlation matrix.
Notice that entry (1,3) of the correlation matrix also equals 0.95. However, as row1 has already been removed (from the first iteration), i do not need the loop to continue running in this case.
> data.frame(Mat1)
Name Strategy.Assets Jan.94 Jan.95 Jan.96 Jan.97 X1 X2 X3 X4 X5
1 a 20 2.95438832384095 0.322838398649362 0.265489849080971 -1.26762589628627 0 0.95 0.95 0.08 0.09
2 b 30 -0.231467724347497 1.36209337396348 0.244352997300369 0.147019042483708 0 0.8 0.6 0.09 0.01
3 c 40 -0.725388339795491 0.0813983643917171 -0.50418751522557 0.802967414744253 0 0 0.7 0.99 0.04
4 d 50 -0.120645568994363 1.7770242881308 -0.798437656907533 1.03842921924565 0 0 0 0.93 0.97
5 e 60 0.818447801042692 -0.767858384307916 -1.23187047713004 -0.0779863946287645 0 0 0 0 0.8
6 f 30 -0.231572883434978 0.266559660007751 0.720829574412722 -0.0207732265889516 0 0 0 0 0
7 g 30 0.444651868913969 0.832322051653509 -0.538333881677844 0.291130047037627 0 0 0 0 0
8 h 40 2.00827324376109 0.775006479679076 0.0323592399331455 -2.33121256833447 0 0 0 0 0
9 i 50 -0.368493048065977 1.30727697822673 -0.446402751994181 -0.0616057412430855 0 0 0 0 0
then my desired output is:
output <- data.frame(rbind(Mat1[2,1:6], Mat1[5:9,1:6]))
output
Name Strategy.Assets Jan.94 Jan.95 Jan.96 Jan.97
1 b 30 -0.231467724347497 1.36209337396348 0.244352997300369 0.147019042483708
2 e 60 0.818447801042692 -0.767858384307916 -1.23187047713004 -0.0779863946287645
3 f 30 -0.231572883434978 0.266559660007751 0.720829574412722 -0.0207732265889516
4 g 30 0.444651868913969 0.832322051653509 -0.538333881677844 0.291130047037627
5 h 40 2.00827324376109 0.775006479679076 0.0323592399331455 -2.33121256833447
6 i 50 -0.368493048065977 1.30727697822673 -0.446402751994181 -0.0616057412430855
I need to be able to apply this function to matrices in a list. Every matrix in the list has a different number of rows, and hence a different number of columns. However the first 6 columns are always the same. Essentially, the 'names' variable increases (it could be from a to z), causing the correlation matrix to increase.(would be 26 by 26 in this case).
Upvotes: 0
Views: 89
Reputation: 389325
If I have understood you correctly, one way could be to first find out row and column index where the value is greater than equal to 0.95 for the correlation columns using which
. For every row and column pair, we find out the corresponding minimum value and remove that row from original Mat1
.
mat <- which(Mat1[, as.character(1:5)] >= 0.95, arr.ind = TRUE)
Mat1[-unique(apply(mat, 1, function(x) x[which.min(Mat1$Strategy.Assets[x])])), ]
# Name Strategy.Assets Jan.94 Jan.95 Jan.96 Jan.97 1 2 3 4 5
#2 b 30 -0.02101158 1.27504998 -0.9534582 -0.6917499 0 0.8 0.6 0.09 0.01
#5 e 60 -0.88500700 -1.19165574 -1.6309657 0.6183648 0 0.0 0.0 0.00 0.80
#6 f 30 -0.45072999 -0.04698659 0.7462178 0.5858741 0 0.0 0.0 0.00 0.00
#7 g 30 1.29131284 -1.60004027 -0.9681269 0.9595044 0 0.0 0.0 0.00 0.00
#8 h 40 0.37883959 -1.11536574 -0.3041997 -0.8202406 0 0.0 0.0 0.00 0.00
#9 i 50 1.50068308 0.89465888 0.6646096 0.4433313 0 0.0 0.0 0.00 0.00
data
Mat1 <- data.frame(matrix(nrow =9, ncol =11))
colnames(Mat1) <- c("Name", "Strategy.Assets", "Jan.94", "Jan.95", "Jan.96",
"Jan.97", "1", "2", "3","4", "5")
Mat1[,1] <- letters[1:9]
Mat1[,2] <- c(20,30,40,50,60,30,30,40,50)
Mat1[,3:6] <- rnorm(36,0,1)
Mat1[,7] <- c(0,0,0,0,0,0,0,0,0)
Mat1[,8] <- c(0.95, 0.8, 0,0,0,0,0,0,0)
Mat1[,9] <- c(0.95,0.6,0.7,0,0,0,0,0,0)
Mat1[,10] <- c(0.08, 0.09, 0.99, 0.93, 0,0,0,0,0)
Mat1[,11] <- c(0.09,0.01,0.04,0.97, 0.80,0,0,0,0)
Upvotes: 2