Laura
Laura

Reputation: 699

selecting columns specified by a random vector in R

I have a large matrix from which I would like to randomly extract a smaller matrix. (I want to do this 1000 times, so ultimately it will be in a for loop.) Say for example that I have this 9x9 matrix:

mat=matrix(c(0,0,1,0,1,0,0,0,1,0,0,0,0,1,1,1,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,1,
          0,0,0,0,1,1,1,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1,1,1,0,0,
          1,0,1,0,0,0,0,0,1,0,1,0,0,0,1), nrow=9)

From this matrix, I would like a random 3x3 subset. The trick is that I do not want any of the row or column sums in the final matrix to be 0. Another important thing is that I need to know the original number of the rows and columns in the final matrix. So, if I end up randomly selecting rows 4, 5, and 7 and columns 1, 3, and 8, I want to have those identifiers easily accessible in the final matrix.

Here is what I've done so far.

First, I create a vector of row numbers and column numbers. I am trying to keep these attached to the matrix throughout.

r.num<-seq(from=1,to=nrow(mat),by=1)      #vector of row numbers
c.num<-seq(from=0, to=(ncol(mat)+1),by=1) #vector of col numbers (adj for r.num)

mat.1<-cbind(r.num,mat)
mat.2<-rbind(c.num,mat.1)

Now I have a 10x10 matrix with identifiers. I can select my rows by creating a random vector and subsetting the matrix.

rand <- sample(r.num,3)
temp1 <- rbind(mat.2[1,],mat.2[rand,])      #keep the identifier row

This works well! Now I want to randomly select 3 columns. This is where I am running into trouble. I tried doing it the same way.

rand2 <- sample(c.num,3)
temp2 <- cbind(temp1[,1],temp1[,rand2])

The problem is that I end up with some row and column sums that are 0. I can eliminate columns that sum to 0 first.

temp3 <- temp1[,which(colSums(temp1[2:nrow(temp1),])>0)]
cols <- which(colSums(temp1[2:nrow(temp1),2:ncol(temp1)])>0)
rand3 <- sample(cols,3)
temp4 <- cbind(temp3[,1],temp3[,rand3])

But I end up with an error message. For some reason, R does not like to subset the matrix this way.

So my question is, is there a better way to subset the matrix by the random vector "rand3" after the zero columns have been removed OR is there a better way to randomly select three complementary rows and columns such that there are none that sum to 0?

Thank you so much for your help!

Upvotes: 5

Views: 1676

Answers (1)

aatrujillob
aatrujillob

Reputation: 4826

If I understood your problem, I think this would work:

mat=matrix(c(0,0,1,0,1,0,0,0,1,0,0,0,0,1,1,1,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,1,
          0,0,0,0,1,1,1,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1,1,1,0,0,
          1,0,1,0,0,0,0,0,1,0,1,0,0,0,1), nrow=9)

smallmatrix = matrix(0,,nrow=3,ncol=3)

 while(any(apply(smallmatrix,2,sum) ==0) | any(apply(smallmatrix,1,sum) ==0)){
      cols = sample(ncol(mat),3)
      rows= sample(nrow(mat),3)
      smallmatrix = mat[rows,cols]
}

colnames(smallmatrix) = cols
rownames(smallmatrix) = rows

Upvotes: 4

Related Questions