SivanG
SivanG

Reputation: 9

efficiency of rbind

I'm writing a script that has to build a large matrix. I want to take a vector of names for each name get data from a different data frame do some operations on it, and then return a vector of data for that name. for example:

allNew=matrix(ncol=ncol(X)-1);
for(name in list)
    {
    tmpdata=all[grep(names,list$Names),];
    data=(as.data.frame(apply(tmpdata[,2:(ncol(tmpdata)-1)],2,sum))==nrow(tmpdata))*1
    colnames(data)=name;
        data=t(data);
        allNew=rbind(allNew,data);
    }

the length of the names list is in the 10000 range, and for each name tmpdata has 1-5 rows. I'm running my code on my labs linux server with about 8 GB ram,
somehow I feel this is taking a lot longer than it should, it takes a few minutes. How can I do this more efficiently?

Upvotes: 0

Views: 718

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145805

As the comments pointed out, growing an object one line at a time is much slower than overwriting parts of a pre-allocated object. Something like this should work--though without any test data it's hard to be sure.

allNew=matrix(NA, ncol=ncol(X)-1, nrow = length(list));
for(i in 1:length(list))
    {
    name <- names(list)[i]
    tmpdata=all[grep(names,list$Names), ]
    data=(as.data.frame(apply(tmpdata[, 2:(ncol(tmpdata)-1)], 2, sum))==nrow(tmpdata))*1
    colnames(data)=name
    allNew[i, ] = t(data)
    }

Upvotes: 1

Related Questions