Reputation: 155
I am dealing with some large data in R:
I have a vector of normally distributed random numbers with length about 6400*50000, I need to sum every 4 elements in this vector to get a smaller one.
Is there any efficient way to do this in R?
My thoughts till now:
example code:
library(parallel)
library(RcppZiggurat)
library(doParallel)
library(foreach)
coreNums<-detectCores()
N1=6400
M=4
N2=N1/M
cl<-makeCluster(getOption("cl.cores", coreNums))
registerDoParallel(cl)
vector1<-zrnorm(N1*K)
vector2=foreach(i=1:(N2*K)) %dopar% {sum(vector1[M*(i-1)+1:M*i])}
vector2=unlist(vector)
Upvotes: 1
Views: 2545
Reputation: 4614
I think colSums
is the function you are looking for.
vector1 = rnorm(1000*50000)
dim(vector1) = c(10, length(vector1)/10)
vector2 = colSums(vector1)
In my opinion, the task is too simple for parallelization. Also, I did not get any problems with the matrix size.
If you want to use less memory, here is the code doing the same in parts of 10,000 values in vector1
.
vector2 = double(length(vector1)/10);
for( i in seq_len(length(vector2)/10000) ){
part = vector1[((i-1)*10000+1):(i*10000)]
dim(part) = c(10, 1000)
vector2[((i-1)*1000+1):(i*1000)] = colSums(part)
}
Upvotes: 2