Reputation: 1029
I am running this for loop
without any problems but it takes a long time. I guess it can be faster with apply family but not sure how. Any hints?
set.seed(1)
nrows <- 1200
ncols <- 1000
outmat <- matrix(NA, nrows, ncols)
dat <- matrix(5, nrows, ncols)
for (nc in 1 : ncols){
for(nr in 1 : nrows){
val <- dat[nr, nc]
if(!is.na(val)){
file <- readBin(dir2[val], numeric(), size = 4, n = 1200*1000)
# my real data where dir2 is a list of files
# "dir2 <- list.files("/data/dir2", "*.dat", full.names = TRUE)"
file <- matrix((data = file), ncol = 1000, nrow = 1200) #my real data
outmat[nr, nc] <- file[nr, nc]
}
}
}
Upvotes: 0
Views: 176
Reputation: 24490
Two solutions.
The first fills more memory, but is more efficient and I guess feasible if you have 24 files, as you stated. You read all the files at once, then you subset properly according to dat
. Something like:
allContents<-do.call(cbind,lapply(dir2,readBin,n=nrows*ncol,size=4,"numeric")
res<-matrix(allContents[cbind(1:length(dat),c(dat+1))],nrows,ncols)
The second one can handle a slightly bigger number of files (say 50-100). It reads chunks of each file and subset consequently. You have to open as many connections as the number of files you got. For instance:
outmat <- matrix(NA, nrows, ncols)
connections<-lapply(dir2,file,open="rb")
for (i in 1:ncols) {
values<-vapply(connections,readBin,what="numeric",n=nr,size=4,numeric(nr))
outmat[,i]<-values[cbind(seq_len(nrows),dat[,i]+1)]
}
The +1
after dat
is due to the fact that, as you stated in the comments, the values in dat
range from 0 to 23 and R
indexing is 1-based.
Upvotes: 3