Lei_Xu
Lei_Xu

Reputation: 155

Dealing with very large vector in R

I am dealing with some large data in R:

I have a vector of normally distributed random numbers with length about 6400*50000, I need to sum every 4 elements in this vector to get a smaller one.

Is there any efficient way to do this in R?

My thoughts till now:

  1. using a matrix with ncol=10 and use apply function-- failed because the matrix size is too big;
  2. Try paralell and foreach package but no progress yet;

example code:

library(parallel)
library(RcppZiggurat)
library(doParallel)
library(foreach)

coreNums<-detectCores()
N1=6400
M=4
N2=N1/M
cl<-makeCluster(getOption("cl.cores", coreNums))
registerDoParallel(cl)
vector1<-zrnorm(N1*K)
vector2=foreach(i=1:(N2*K)) %dopar% {sum(vector1[M*(i-1)+1:M*i])}
vector2=unlist(vector)

Upvotes: 1

Views: 2545

Answers (1)

Andrey Shabalin
Andrey Shabalin

Reputation: 4614

I think colSums is the function you are looking for.

vector1 = rnorm(1000*50000)
dim(vector1) = c(10, length(vector1)/10)
vector2 = colSums(vector1)

In my opinion, the task is too simple for parallelization. Also, I did not get any problems with the matrix size.

If you want to use less memory, here is the code doing the same in parts of 10,000 values in vector1.

vector2 = double(length(vector1)/10);
for( i in seq_len(length(vector2)/10000) ){
    part = vector1[((i-1)*10000+1):(i*10000)]
    dim(part) = c(10, 1000)
    vector2[((i-1)*1000+1):(i*1000)] = colSums(part)
}

Upvotes: 2

Related Questions