Reputation: 805
I have a matrix with M rows and N columns. I need to randomly sample different locations in these matrix and return the row indexes and col indexes.
My approach: Say, I want to sample 30 percentage of entries in the matrix. Then, I iterate through the whole matrix, at each point, I toss a biased coin with heads of 30 percent probability and select the location if heads comes. Since, my data is large, this approximately selects 30% of the entries. However, I observe that this is really slow. Is there a way to speed this up? Or a better way to do it?
Upvotes: 1
Views: 2754
Reputation: 24480
If m
is your matrix, just try:
arrayInd(sample(length(m),0.3*length(m)),dim(m))
An example:
set.seed(1)
m<-matrix(ncol=6,nrow=6)
arrayInd(sample(length(m),0.3*length(m)),dim(m))
# [,1] [,2]
# [1,] 4 2
# [2,] 2 3
# [3,] 2 4
# [4,] 6 5
# [5,] 1 2
# [6,] 4 5
# [7,] 5 5
# [8,] 4 6
# [9,] 6 3
#[10,] 2 1
Upvotes: 4
Reputation: 21433
My new favorite option:
indexSampler <- function(m, p) {
matrix(sample(c(TRUE,FALSE), length(m), p = c(p, 1 - p), replace=TRUE), ncol(m))
}
You won't get indices, but you'll get a matrix full of TRUE/FALSE that can be used to index.
It is ridiculously fast (a factor of 1000 for a matrix of 200x200, and also significantly faster for small matrices).
Upvotes: 1
Reputation: 56004
See this example:
m=2
n=5
SampleSize=0.3
#dummy data
x <- matrix(runif(m*n),nrow=n)
#sample
set.seed(123)
temp <- x
temp[ sample(1:length(temp),round(length(temp)*SampleSize))] <- -9
#index
ix <- temp==-9
ix
# [,1] [,2]
# [1,] FALSE FALSE
# [2,] FALSE FALSE
# [3,] TRUE TRUE
# [4,] TRUE FALSE
# [5,] FALSE FALSE
Upvotes: 1